Last Week in AI Security — Week of February 23, 2026

Executive Summary

This week witnessed the collision of AI security policy and technical reality. On February 28, the U.S. Department of Defense declared Anthropic a national security supply chain risk following a public dispute over usage restrictions, immediately banning all military contractors from conducting commercial activity with the AI firm. This unprecedented move signals a hardening regulatory stance on AI vendor autonomy and sets a precedent for how governments may wield procurement power to enforce control over frontier AI deployments.

Meanwhile, technical vulnerabilities continued to expose the fragility of the AI software supply chain. CrowdStrike’s 2026 Global Threat Report revealed that average eCrime breakout time fell to just 29 minutes in 2025, with the fastest observed breakout occurring in only 27 seconds, demonstrating how AI acceleration is compressing adversary timelines to sub-human response speeds. A new critical vulnerability in vLLM (CVE-2026-22778) enables remote code execution on vulnerable deployments by submitting a malicious video link to the API, while PyTorch’s CVE-2025-32434 affects versions 2.5.1 and prior, specifically bypassing the torch.load() function’s weights_only=True parameter with a critical CVSS v4 score of 9.3.

The week also saw major advances in AI security frameworks and research. MITRE ATLAS received its first 2026 update with contributions from Zenity researchers, introducing new agentic AI attack techniques, while the Gray Swan Indirect Prompt Injection Challenge Q1 2026 launched with a $40,000 prize pool sponsored by UK AISI, OpenAI, Anthropic, Amazon, Meta, and Google DeepMind. These developments underscore a critical inflection point: as AI systems become more capable and autonomous, the attack surface expands faster than defensive capabilities can adapt.

Top Stories

Pentagon Declares Anthropic National Security Supply Chain Risk

Defense Secretary Pete Hegseth deemed artificial intelligence firm Anthropic a “supply chain risk to national security” on Friday, February 28, following days of increasingly heated public conflict over the company’s effort to place guardrails on the Pentagon’s use of its technology. Effective immediately, “no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic”.

The designation stems from a dispute over usage restrictions. The conflict centers around Anthropic’s push for guardrails that would explicitly prevent the military from using its powerful Claude AI model to conduct mass surveillance on Americans or to power fully autonomous weapons. The Pentagon, for its part, demanded the ability to use Claude for “all lawful purposes”. The Pentagon had given Anthropic a deadline of Friday at 5:01 p.m. to either reach an agreement or lose out on its lucrative contracts with the military.

Anthropic was awarded a $200 million contract from the Pentagon last July to develop AI capabilities that would advance national security. The decision could have wide-ranging ripple effects across the defense industrial base, as the sheer number of companies that contract with the Pentagon means many firms will need to immediately reassess their AI vendor relationships.

The move represents an aggressive use of procurement power to enforce control over AI deployment terms. While Anthropic CEO Dario Amodei has emphasized the company’s patriotic intent, arguing that guardrails are necessary because Claude is not infallible enough to power fully autonomous weapons and a powerful AI model could raise serious privacy concerns, the Pentagon’s position prevailed. Security practitioners must now anticipate that AI vendor relationships—particularly with frontier model providers—may carry geopolitical risk that can materialize with little warning.

CrowdStrike: AI Accelerates Adversaries to 29-Minute Breakout Time

CrowdStrike today released its 2026 Global Threat Report, revealing that AI is accelerating the adversary and expanding the enterprise attack surface. The data paints a stark picture of how AI is compressing intrusion timelines to unprecedented speeds.

The average eCrime breakout time fell to just 29 minutes in 2025, with the fastest observed breakout occurring in only 27 seconds. Breakout time measures how quickly an attacker moves from initial access to lateral movement—the moment when containment becomes exponentially harder. Twenty-nine minutes is barely enough time to convene an incident response team, let alone execute a coordinated defense.

Adversaries are also actively exploiting AI systems themselves, injecting malicious prompts into GenAI tools at more than 90 organizations and abusing AI development platforms. This dual threat—AI-enabled attackers and AI systems as targets—creates a vicious cycle where defensive tools and attack capabilities evolve in lockstep.

Zero Day and Cloud Exploitation Grows: 42% of vulnerabilities were exploited before public disclosure as adversaries weaponized zero days for initial access, remote code execution, and privilege escalation. Cloud-conscious intrusions rose by 37% overall, with a 266% increase from state-nexus threat actors targeting cloud environments for intelligence collection.

Adam Meyers, head of counter adversary operations at CrowdStrike, stated: “This is an AI arms race. Breakout time is the clearest signal of how intrusion has changed. Adversaries are moving from initial access to lateral movement in minutes. AI is compressing the time between intent and execution while turning enterprise AI systems into targets”.

The implications are clear: manual incident response workflows cannot match adversary speed. Organizations must shift to automated detection and response operating at machine speed, while simultaneously hardening AI infrastructure against exploitation.

Critical vLLM Remote Code Execution Vulnerability

CVE-2026-22778 enables remote code execution on vulnerable vLLM deployments by submitting a malicious video link to the API. vLLM is a high-throughput, memory-efficient engine designed for serving Large Language Models (LLMs). It enables running LLMs on your servers faster, cheaper, and more efficiently than other general-purpose local runners like Ollama, especially under heavy concurrent workloads.

Any organization using vLLM and exposing a video model for user input is at risk. This RCE can be used for a full server takeover, including arbitrary command execution, data exfiltration, and lateral movement.

The vulnerability is a chained exploit. vLLM uses OpenCV to decode videos. OpenCV bundles FFmpeg 5.1.x, which contains a heap overflow in the JPEG2000 decoder. Because OpenCV is used for video decoding, constructing a video from JPEG2000 frames can reach this vulnerability and lead to command execution.

Update vLLM to the latest version that includes the fix (0.14.1). Organizations that cannot immediately patch should disable video model features in production until remediation is complete. OX customers affected by this issue were informed to update their vLLM version.

The vLLM case illustrates a recurring pattern in AI infrastructure security: popular inference engines like vLLM, Ollama, and TensorRT-LLM sit in the critical path of production AI deployments but often lack the security scrutiny of their upstream dependencies. Supply chain composition risk—where one library’s vulnerability cascades through multiple downstream projects—remains a significant blindspot.

Framework & Standards Updates

NIST Cyber AI Profile Comment Period Closes January 30

The Cyber AI Profile (NIST Community Profile) had its comment period close on January 30th. The U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) released an initial preliminary draft of the Cybersecurity Framework Profile for Artificial Intelligence (Cyber AI Profile or NIST IR 8596).

The preliminary draft is designed as a voluntary framework that would extend the recently updated NIST Cybersecurity Framework (CSF) 2.0 to new cybersecurity risks and opportunities introduced by AI and to also complement NIST’s AI Risk Management Framework (AI RMF). The preliminary draft of the Cyber AI Profile is organized around: Three Focus Areas: Secure (securing AI systems); Defend (conducting AI-enabled cyber defense); and Thwart (thwarting adversarial cyberattacks using AI). Six CSF 2.0 Core Functions: Govern, Identify, Protect, Detect, Respond, and Recover.

In a separate but related release, NIST also made available a discussion draft covering “Control Overlays for Securing AI Systems” including “Overview and Methodology” (NIST IR 8605) and “Using and Fine-Tuning Predictive AI” (NIST IR 8605A), which will serve as complements to the Cyber AI Profile.

The comment period closed on January 30, meaning the final Cyber AI Profile is expected to be published in the coming weeks. Organizations should prepare to adopt the profile as a foundational reference for AI-specific cybersecurity controls.

MITRE ATLAS Receives First 2026 Update with Agentic AI Techniques

In the first MITRE ATLAS update of 2026, Zenity researchers contributed substantially to expanding the framework’s coverage of agentic AI threats. ATLAS added 14 new techniques in 2025 for AI agents, covering risks like prompt injection and memory manipulation attacks.

A new technique documents how attackers can exploit AI service APIs as part of broader attack chains by leveraging existing infrastructure that allows bad actors to live off the land, maintaining stealth operations and maintaining persistent access for espionage, reconnaissance, and more. Zenity also contributed to a new MITRE ATLAS case study: SesameOp (AML.CS0042), which documents a novel backdoor technique leveraging the OpenAI Assistants API for command and control.

MITRE ATLAS catalogs 15 tactics, 66 techniques, and 46 sub-techniques specifically targeting AI and machine learning systems as of October 2025. The framework is seeing increased adoption as a complement to traditional ATT&CK matrices, with ATLAS complements rather than competes with OWASP LLM Top 10 and NIST AI RMF — use all three for comprehensive coverage.

OWASP LLM Top 10 2025 Released; Agentic Top 10 Published

While the OWASP LLM Top 10 2025 was released earlier (February 16), this week saw continued discussion of its implications for agentic systems. The OWASP Top 10 for Large Language Model Applications started in 2023 as a community-driven effort to highlight and address security issues specific to AI applications. Since then, the technology has continued to spread across industries and applications, and so have the associated risks. As LLMs are embedded more deeply in everything from customer interactions to internal operations, developers and security professionals are discovering new vulnerabilities—and ways to counter them.

A separate framework, the OWASP Top 10 for Agentic Applications 2026, has also been published. OWASP Top 10 for Agentic Applications 2026 is a practitioner-oriented security compass for organizations deploying autonomous AI agents. It adapts the familiar OWASP Top 10 format to agentic systems that plan, delegate, and act across tools, identities, and other agents. The document identifies the ten highest-impact agent-specific risks, from goal hijacking and tool misuse to rogue agents and cascading failures.

No significant updates to ISO 42001 or other major AI governance standards were reported this week.

Vulnerability Watch

CVE-2025-32434: PyTorch Remote Code Execution via weights_only Bypass

A critical Remote Command Execution (RCE) vulnerability has been discovered in PyTorch, tracked as CVE-2025-32434. The vulnerability affects PyTorch versions 2.5.1 and prior, specifically in the torch.load() function when used with weights_only=True parameter. This security flaw was discovered by security researcher Ji’an Zhou and has been assigned a critical CVSS v4 score of 9.3.

The vulnerability exists in PyTorch’s model loading functionality, specifically when using the torch.load() function with weights_only=True parameter. This parameter was previously considered a security safeguard, but the researcher proved that it could still be exploited to achieve remote code execution. The vulnerability has been classified under CWE-502 (Deserialization of Untrusted Data).

If successfully exploited, this vulnerability allows attackers to execute arbitrary commands on the target machine. This could potentially lead to data breaches, system compromise, or lateral movement in cloud-hosted AI environments. The impact is particularly severe because many developers trust weights_only=True as a security measure.

The primary mitigation is to update PyTorch to version 2.6.0 or higher, which contains the patch for this vulnerability. The team responsible for developing the PyTorch framework released its update 2.6.0, in which the vulnerability CVE-2025-32434 was successfully fixed. All previous versions – up to 2.5.1 – remain vulnerable and should be updated as soon as possible.

This vulnerability is particularly concerning because it undermines a fundamental security assumption in the ML ecosystem: that model weights can be safely loaded in isolation from untrusted sources when using the weights_only parameter. Organizations that load models from public repositories (Hugging Face, etc.) must audit their loading practices immediately.

CVE-2026-22778: vLLM Remote Code Execution via Malicious Video

Covered in Top Stories above. Update vLLM to version 0.14.1 or higher. If updates are not feasible, disable video model functionality in production environments.

Additional vLLM Vulnerabilities

vLLM version 0.11.0rc2 fixes a timing attack vulnerability in API key support. Before version 0.11.0rc2, the API key support in vLLM performs validation using a method that was vulnerable to a timing attack. API key validation uses a string comparison that takes longer the more characters the provided API key gets correct.

A memory corruption vulnerability in vLLM (CVE-2025-62164) affects versions 0.10.2 and later, placing production AI deployments at immediate risk. The vulnerability allows any user with access to the API to potentially achieve denial-of-service and remote code execution in the vLLM server process. The vulnerability stems from the way vLLM processes user-supplied prompt embeddings.

Organizations using vLLM should prioritize updating to the latest stable release and review all exposed API endpoints for unauthorized access.

Industry Radar

OpenSSF AI/ML Security Working Group Launches Bi-Weekly Meetings

Join us for the first OpenSSF Tech Talk of the year, focusing on agentic artificial intelligence (AI) security. In this session, we will explore how the OpenSSF AI/ML Security Working Group is developing open guidance and frameworks to help secure AI and machine learning systems, and how that work translates into real-world practice. The AI/ML Security WG has established a new bi-weekly meeting for collaboration on AI security work with representatives from OpenSSF, CoSAI, AGNTCY, NIST, SPDX, OWASP and more.

OpenSSF has released a new Compiler Annotations Guide for C and C++ to help developers improve memory safety, diagnostics, and overall software security by using compiler-supported annotations. The guide explains how annotations in GCC and Clang/LLVM can make code intent explicit, strengthen static analysis, reduce false positives, and enable more effective compile-time and run-time protections.

U.S. Treasury Releases AI Cybersecurity Resources for Financial Sector

In support of the President’s AI Action Plan, the U.S. Department of the Treasury announced the conclusion of a major public-private initiative to strengthen cybersecurity and risk management for artificial intelligence (AI) in the financial services sector. Over the course of February, Treasury will release a series of six resources developed in partnership with industry and federal and state regulatory partners.

The Artificial Intelligence Executive Oversight Group (AIEOG), a partnership between the Financial and Banking Information Infrastructure Committee and the Financial Services Sector Coordinating Council, brought together senior executives from financial institutions, federal and state financial regulators, and other key stakeholders. Together, participants focused on addressing identified gaps in the financial sector’s use of AI, developing practical tools that financial institutions can use to manage AI-specific cybersecurity risks.

NVIDIA Announces AI-Powered OT Cybersecurity Integrations

Akamai, Forescout, Palo Alto Networks, Siemens and Xage Security integrate NVIDIA accelerated computing and AI to advance OT cybersecurity. At the S4x26 security conference, Siemens will demonstrate its AI-ready Industrial Automation DataCenter. Through the integration of NVIDIA BlueField, it is uniquely possible to deliver a truly AI-ready, zero-trust solution tailored for the demands on industrial automation.

Anthropic Prompt Injection Metrics Published

Run a prompt injection attack against Claude Opus 4.6 in a constrained coding environment, and it fails every time, 0% success rate across 200 attempts, no safeguards needed. The latest models’ 212-page system card, released February 5, breaks out attack success rates by surface, by attempt count, and by safeguard configuration.

For years, prompt injection was a known risk that no one quantified. Security teams treated it as theoretical. AI developers treated it as a research problem. That changed when Anthropic made prompt injection measurable across four distinct agent surfaces, with attack success rates that security leaders can finally build procurement decisions around.

This represents a significant shift toward quantifiable AI security metrics, though the methodology and reproducibility of these measurements remain to be independently validated.

Policy Corner

U.S. AI Legislation Advances Through Committee

Multiple tech-focused bills advanced through committee this week. On Feb. 25, the House Science, Space and Technology Committee passed the ACERO Act, the Small Business Artificial Intelligence Advancement Act, and the ASCEND Act.

Sens. Todd Young, R-Ind., and Maria Cantwell, D-Wash., reintroduced their Future of AI Innovation Act on Thursday, a bill that seeks to establish uniform standards for AI research and development, as well as promote innovation in the private sector. Initially introduced in 2024, the measure aims to support U.S. leadership in AI through multiple vehicles, including public-private partnerships, codification of the Center for Artificial Intelligence Standards and Innovation within the National Institutes of Standards and Technology, increased interagency coordination and international coalitions.

No major updates on the EU AI Act enforcement or international AI governance beyond what was covered in previous weeks.

Research Spotlight

Prompt Injection and Jailbreak Vulnerability Studies

Multiple academic papers were published this week examining prompt injection and jailbreak attacks:

A new study evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. The study observes significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, several lightweight, inference-time defence mechanisms were evaluated that operate as filters without any retraining or GPU-intensive fine-tuning. Although these defences mitigate straightforward attacks, they are consistently bypassed by long, reasoning-heavy prompts. Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

Another comprehensive study published to arXiv evaluated over 1,400 adversarial prompts across four LLMs. Experiments evaluated over 1,400 adversarial prompts across four LLMs: GPT-4, Claude 2, Mistral 7B, and Vicuna. The analysis examines results along several dimensions, including model susceptibility, attack technique efficacy, prompt behavior patterns, and cross-model generalization. Among the tested models, GPT-4 demonstrated the highest vulnerability with an ASR of 87.2%, confirming its powerful but permissive instruction-following nature. Prompt injections exploiting roleplay dynamics (e.g., impersonation of fictional characters or hypothetical scenarios) achieved the highest ASR (89.6%). Logic trap attacks (ASR: 81.4%) exploit conditional structures and moral dilemmas to elicit disallowed content. Encoding tricks (e.g., base64 or zero-width characters) achieved 76.2% ASR by evading keyword-based filtering mechanisms. Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs

A comprehensive review of prompt injection attacks spanning 2023-2025 was published in MDPI Information journal. This comprehensive review synthesizes research from 2023 to 2025, analyzing 45 key sources, industry security reports, and documented real-world exploits. The review examines the taxonomy of prompt injection techniques, including direct jailbreaking and indirect injection through external content. The rise of AI agent systems and the Model Context Protocol (MCP) has dramatically expanded attack surfaces, introducing vulnerabilities such as tool poisoning and credential theft. Critical incidents documented include GitHub Copilot’s CVE-2025-53773 remote code execution vulnerability (CVSS 9.6) and ChatGPT’s Windows license key exposure. Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review

Adversarial Machine Learning Survey Published

A meta-survey of adversarial attacks against AI algorithms was published this week. Deep neural networks have revolutionized artificial intelligence, solving complex issues in areas like healthcare or law enforcement and security. However, they are susceptible to adversarial attacks where small data manipulations can compromise system reliability and security. This paper conducts an umbrella review of the literature on these attacks, synthesizing results from various systematic reviews to assess attack strategies, defense effectiveness, and research gaps. A meta-survey of adversarial attacks against artificial intelligence algorithms, including diffusion models

What This Means For You

Immediate Actions (This Week):

Patch PyTorch and vLLM immediately. If you’re running PyTorch < 2.6.0 or vLLM < 0.14.1, these are critical, actively exploitable vulnerabilities affecting the core of your ML stack. Audit all model loading code that uses torch.load() and verify you’re not relying on weights_only=True as your sole security control when loading untrusted models.
Review vendor dependencies for geopolitical risk. The Anthropic-Pentagon conflict demonstrates that AI vendor relationships can be severed overnight for policy reasons. If you’re a defense contractor or operate in regulated industries, map your AI supply chain and identify single points of vendor failure. Develop contingency plans for API access interruptions.
Measure your incident response speed. CrowdStrike’s 29-minute average breakout time means you have less than half an hour from initial access to lateral movement. Time-box your security operations playbooks and identify where manual workflows introduce unacceptable latency. If your MTTR exceeds attacker breakout time, you’re operating at a structural disadvantage.

Strategic Shifts (Next 30 Days):

Implement quantitative AI security metrics. Anthropic’s publication of prompt injection success rates by attack surface represents a maturation of the field. Work with your AI vendors to establish similar quantitative security SLAs. Move beyond qualitative “safety” claims to measurable attack resistance thresholds.
Adopt the NIST Cyber AI Profile. With the comment period closed, the final profile is imminent. Begin mapping your existing AI security controls to the profile’s three focus areas (Secure, Defend, Thwart) to identify gaps before auditors or regulators demand it.
Red team your AI agents. The Gray Swan prompt injection challenge and MITRE ATLAS agentic techniques provide a blueprint for testing autonomous AI systems. Run structured adversarial evaluations against production agents, particularly those with tool-calling or API access. Document failure modes before attackers do.

Long-Term Positioning:

The fundamental tension this week exposed is unresolved: AI systems are being deployed at scale faster than security controls can mature, while nation-states, vendors, and enterprises negotiate who controls the terms of use. Organizations that treat AI security as a bolt-on compliance exercise will find themselves caught between technical vulnerabilities and policy whiplash. Those that instrument security into the AI development lifecycle—with quantitative metrics, continuous adversarial testing, and vendor risk management—will maintain resilience regardless of which direction the regulatory wind blows.

Tools and Resources

MITRE ATLAS Navigator — Updated with 14 new agentic AI techniques. Use for threat modeling AI-specific attack paths.
NIST Cyber AI Profile (NIST IR 8596 iprd) — Preliminary draft integrating AI-specific considerations into CSF 2.0. Final version expected soon.
Gray Swan Indirect Prompt Injection Challenge — $40K prize pool, runs through March 11. Test your red teaming skills against frontier models.
OpenSSF Compiler Annotations Guide — Practical guidance for C/C++ memory safety using compiler-supported annotations.
U.S. Treasury AI Cybersecurity Resources — Six resources for financial sector AI risk management, released throughout February 2026.
PromptArmor Claude Cowork Vulnerability Disclosure — Independent research demonstrating hidden prompt injection in Anthropic’s Cowork feature. Useful for understanding file-based injection vectors.

Key Highlights