Last Week in AI Security — Week of March 16, 2026

Executive Summary

This week brought a harsh reminder that the AI inference layer—where models are actively serving user requests—is a critical and often under-defended attack surface. A critical vulnerability, CVE-2026-22778, was discovered in vLLM, a popular framework for serving Large Language Models (LLMs) with high throughput, allowing an attacker to achieve Remote Code Execution (RCE) simply by sending a malicious video link to a vLLM API. The vulnerability chains a memory disclosure bug with a heap overflow in OpenCV’s bundled FFmpeg, enabling full server takeover on systems processing multimodal inputs. With vLLM downloaded over 3 million times per month, the blast radius is enormous.

Meanwhile, research from Palo Alto Networks’ Unit 42 demonstrated that even well-defended production models remain fragile under systematic adversarial testing. Unit 42 researchers developed a genetic algorithm-inspired prompt fuzzing method to automatically generate variants of disallowed requests that preserved their original meaning, uncovering guardrail weaknesses with evasion rates ranging from low single digits to high levels in specific keyword and/or model combinations. The study underscores that small failure rates become reliable when attackers can automate at volume—a lesson that should inform every organization’s red teaming strategy.

On the industry front, HiddenLayer, a leading artificial intelligence security company, released its 2026 AI Threat Landscape Report, analyzing the most pressing risks facing organizations as AI systems evolve from assistive tools to autonomous agents capable of independent action, based on a survey of 250 IT and security leaders. While agentic AI remains in the early stages of enterprise deployment, the risks are already materializing, with 1 in 8 companies reporting AI breaches now linked to agentic systems. The confluence of technical vulnerabilities, immature guardrails, and accelerating deployment of agentic systems paints a troubling picture: we are building faster than we are securing.

Top Stories

Critical vLLM RCE Lets Attackers Take Over Servers via Malicious Video Link

A critical vulnerability, CVE-2026-22778, was recently discovered in vLLM, a popular framework for serving Large Language Models (LLMs) with high throughput, allowing an attacker to achieve Remote Code Execution (RCE) simply by sending a malicious video link to a vLLM API. vLLM is a high-throughput, memory-efficient engine designed for serving Large Language Models (LLMs), enabling running LLMs on your servers faster, cheaper, and more efficiently than other general-purpose local runners like Ollama, especially under heavy concurrent workloads.

The vulnerability chains two distinct flaws. The attack begins with a memory disclosure bug that leaks a libc address to reduce ASLR search space, then exploits a heap overflow in OpenCV’s bundled FFmpeg 5.1.x JPEG2000 decoder; because OpenCV is used for video decoding, constructing a video from JPEG2000 frames can reach this vulnerability and lead to command execution. This RCE can be used for a full server takeover, including arbitrary command execution, data exfiltration, and lateral movement.

Any organization using vLLM and exposing a video model for user input is at risk. The mitigation is straightforward: update vLLM to the latest version that includes the fix (0.14.1). To mitigate this issue, vLLM updated its OpenCV version to a fixed release. The vulnerability also highlights the broader risk of AI systems relying on complex media processing libraries with known CVEs—OpenCV, FFmpeg, and similar components are frequent sources of memory-safety bugs that AI inference engines inadvertently inherit.

For organizations deploying multimodal models, this incident should trigger an immediate supply-chain audit of all media processing dependencies and a review of whether model-serving endpoints accept untrusted multimedia inputs without sandboxing or pre-validation.

Palo Alto Unit 42 Study Exposes Fragility of LLM Guardrails Under Systematic Fuzzing

Palo Alto Networks’ Unit 42 researchers developed a genetic algorithm-inspired prompt fuzzing method to automatically generate variants of disallowed requests that preserved their original meaning, uncovering guardrail weaknesses with evasion rates ranging from low single digits to high levels in specific keyword and/or model combinations. Published on March 17, 2026, the research paper demonstrates that the key difference from prior single-prompt jailbreak examples is scalability—small failure rates become reliable when attackers can automate at volume.

Despite years of investment in defenses, prompt jailbreaking and prompt injection remain one of the most well-known and actively discussed attack classes against LLM applications, with OWASP listing prompt injection as the top risk category for LLM applications in 2025. The study tested both closed-source and open-weight pretrained models, revealing that evasion rates ranged from low single digits to high levels in specific keyword and/or model combinations, with small failure rates becoming reliable when attackers can automate at volume.

By adapting a genetic algorithm-based fuzzing approach to generate meaning-preserving prompt variants, Unit 42 was able to trigger policy-violating outcomes against both closed-source and open-weight pretrained models. The practical implication is clear: From a practitioner perspective, the most actionable next step is to operationalize this kind of testing as continuous regression, involving running fuzzing-based adversarial evaluations when models, prompts or filters change.

The research also calls for guardrails that are more robust to meaning-preserving variation and clearer evaluation standards that measure not just refusal rate, but boundary fragility and failure modes under automation. For security teams, this study provides a playbook: adopt adversarial fuzzing as a standard part of the model validation pipeline, measure guardrail robustness under variation (not just single-shot attacks), and prepare for attackers who will industrialize these techniques.

HiddenLayer’s 2026 Threat Report Finds 1 in 8 Companies Hit by Agentic AI Breaches

HiddenLayer, a leading artificial intelligence security company, has released its 2026 AI Threat Landscape Report, an analysis of the most pressing risks facing organizations as AI systems evolve from assistive tools to autonomous agents capable of independent action, based on a survey of 250 IT and security leaders. The report’s most striking finding: while agentic AI remains in the early stages of enterprise deployment, the risks are already materializing, with 1 in 8 companies reporting AI breaches now linked to agentic systems.

The report reveals a troubling disconnect between adoption velocity and security readiness. One survey cited in the report found that 83% of organizations planned to deploy agentic AI capabilities into their business functions, while only 29% reported being ready to operate those systems securely. This mismatch between ambition and preparedness is already manifesting in real incidents.

Earlier analysis from Palo Alto Networks corroborates the risk. By using a “single, well-crafted prompt injection or by exploiting a ‘tool misuse’ vulnerability,” adversaries now “have an autonomous insider at their command, one that can silently execute trades, delete backups, or pivot to exfiltrate the entire customer database”. It’s probably going to get a lot worse before it gets better, referring to prompt-injection, meaning security teams just don’t think they have these systems locked down enough.

The report underscores that while agentic AI remains in the early stages of enterprise deployment, the risks are already materializing, signaling that security frameworks and governance controls are not keeping pace with AI system capabilities. For CISOs, the message is unambiguous: agentic AI is not a future threat—it’s a present-day attack vector that demands immediate investment in governance, monitoring, and response capabilities.

Framework & Standards Updates

No major updates to NIST AI RMF, ISO 42001, OWASP LLM Top 10, or MITRE ATLAS were reported this week. However, several relevant context updates emerged:

NIST Cyber AI Profile: NIST continues to accept public comments on the preliminary draft of the Cybersecurity Framework Profile for Artificial Intelligence. The profile aims to help organizations incorporate AI into cybersecurity planning by suggesting key actions to prioritize.
ISO 42001 Adoption: In January 2026, NQA was granted UKAS accreditation for the certification of ISO/IEC 42001, enabling organisations to achieve accredited, independently verified certification to this emerging and increasingly important standard. K&L Gates LLP earned ISO/IEC 42001:2023 certification for its Artificial Intelligence Management System (AIMS) on March 9, 2026, becoming one of the first law firms worldwide to achieve the internationally recognized standard for AI governance.

Vulnerability Watch

CVE-2026-22778 — vLLM Remote Code Execution via Malicious Video URL

Severity: Critical
CVSS: Not yet assigned (likely 9.0+)
Affected: vLLM versions prior to 0.14.1

A critical vulnerability allows an attacker to achieve Remote Code Execution (RCE) simply by sending a malicious video link to a vLLM API. The vulnerability chains a memory leak with a heap overflow in FFmpeg’s JPEG2000 decoder bundled with OpenCV. Mitigation: Update vLLM to the latest version that includes the fix (0.14.1).

CVE-2026-22732 — Spring Security HTTP Header Caching Vulnerability

Severity: High
CVSS: Not specified
Affected: Spring Boot versions prior to 4.1.0-M3

A vulnerability allows an attacker to expose sensitive data via caching mechanisms due to HTTP headers not being written while specifying HTTP response headers for servlet applications using Spring Security. Spring Boot 4.1.0-M3 (third milestone release) addresses this CVE. Mitigation: Upgrade to Spring Boot 4.1.0-M3 or later. More details available in the Spring Boot release notes.

CVE-2026-0994 — Protobuf Vulnerability (Patched in vLLM)

Severity: Not specified
Affected: vLLM versions prior to v0.8.0

vLLM patched a protobuf vulnerability tracked as CVE-2026-0994 in release v0.8.0. No further technical details were disclosed. Organizations using vLLM should ensure they are running v0.8.0 or later.

CVE-2025-62164 — vLLM Prompt Embeddings Deserialization RCE

Severity: Critical
CVSS: 8.8
Affected: vLLM versions prior to 0.7.1 (patched in earlier release, follow-up protection added in v0.7.4)

A critical flaw (CVE-2025-62164) in vLLM allows attackers to achieve Remote Code Execution (RCE) or crash servers simply by sending a malicious API request; the vulnerability lies in how vLLM handles prompt embeddings. vLLM added additional protection for CVE-2025-62164 in version v0.7.4. Note: This CVE was previously patched but received additional hardening this week. Organizations should verify they are running at least v0.7.4.

Industry Radar

NVIDIA Announces Agent Toolkit with OpenShell Runtime: On March 16, 2026, NVIDIA announced the Agent Toolkit, which includes the NVIDIA OpenShell open source runtime for building self-evolving agents with more safety and security. NVIDIA today announced that it is teaming with partners to ignite the next era of AI with open source software for autonomous, self-evolving enterprise AI agents—increasing agent safety, security and efficiency. The toolkit integrates with leading software platforms including Adobe, Atlassian, CrowdStrike, and ServiceNow.
K&L Gates Achieves ISO 42001 Certification: Global law firm K&L Gates LLP earned ISO/IEC 42001:2023 certification for its Artificial Intelligence Management System (AIMS) on March 9, 2026, becoming one of the first law firms worldwide to achieve the internationally recognized standard for AI governance.

Policy Corner

No significant new regulatory developments were reported for the week of March 16–22, 2026. The EU AI Act remains on track for phased enforcement into 2026, with high-risk AI system obligations beginning later this year.

Research Spotlight

Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs

Published: May 2025 (arXiv)

This comprehensive study evaluated over 1,400 adversarial prompts across GPT-4, Claude 2, Mistral 7B, and Vicuna. GPT-4 demonstrated the highest vulnerability with an ASR of 87.2%, confirming its powerful but permissive instruction-following nature, while Claude 2 performed slightly better in filtering but still succumbed to 82.5% of attacks. Prompt injections exploiting roleplay dynamics (e.g., impersonation of fictional characters or hypothetical scenarios) achieved the highest ASR (89.6%), with these prompts often bypassing filters by deflecting responsibility away from the model. The research provides empirical evidence that jailbreak techniques are both transferable and evolving.

Open, Closed and Broken: Prompt Fuzzing Finds LLMs Still Fragile Across Open and Closed Models

Palo Alto Networks Unit 42, March 17, 2026

Unit 42 researchers developed a genetic algorithm-inspired prompt fuzzing method to automatically generate variants of disallowed requests that preserved their original meaning, uncovering guardrail weaknesses with evasion rates ranging from low single digits to high levels in specific keyword and/or model combinations. The key insight: small failure rates become reliable when attackers can automate at volume. The study provides a blueprint for adversarial testing and underscores that prompt jailbreaking remains a practical risk even after years of safety engineering progress.

Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review

MDPI Information, January 7, 2026

This comprehensive review synthesizes research from 2023 to 2025, analyzing 45 key sources, industry security reports, and documented real-world exploits, examining the taxonomy of prompt injection techniques including direct jailbreaking and indirect injection through external content, with the rise of AI agent systems and the Model Context Protocol (MCP) dramatically expanding attack surfaces. The review documents critical incidents including GitHub Copilot’s CVE-2025-53773 remote code execution vulnerability (CVSS 9.6) and ChatGPT’s Windows license key exposure.

Analysis of LLMs Against Prompt Injection and Jailbreak Attacks

arXiv, February 2026

This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants, observing significant behavioural variation across models and evaluating several lightweight, inference-time defence mechanisms that operate as filters without any retraining, though these defences are consistently bypassed by long, reasoning-heavy prompts. The study highlights that silent non-responsiveness (triggered by internal safety mechanisms) is a common failure mode across models.

What This Means For You

This week’s developments underscore three urgent priorities for security practitioners:

1. Audit and harden AI inference endpoints immediately. The vLLM CVE-2026-22778 vulnerability is a wake-up call: inference servers are production systems that require the same rigorous patching, dependency management, and input validation as any other public-facing service. If you’re running vLLM, upgrade to 0.14.1 now. More broadly, conduct a supply-chain audit of all media processing libraries (OpenCV, FFmpeg, Pillow, etc.) used by your AI systems and ensure you have a process to track and patch CVEs in these dependencies. Consider sandboxing inference workloads and restricting the file types and URLs your models can process.

2. Adopt adversarial fuzzing as a standard part of your model validation pipeline. The Palo Alto Unit 42 study demonstrates that single-shot jailbreak testing is insufficient. Attackers will automate prompt generation, and small failure rates compound into reliable exploits at scale. Security teams should operationalize continuous adversarial testing using fuzzing frameworks that generate meaning-preserving prompt variants. Measure not just refusal rate, but boundary fragility—how much variation can your guardrails tolerate before they fail? Integrate these evaluations into your CI/CD pipeline so that every model update or prompt template change triggers a new round of red teaming.

3. Treat agentic AI deployments as high-risk insider threats. HiddenLayer’s finding that 1 in 8 companies have already experienced agentic AI breaches should set off alarms. Autonomous agents with tool access are effectively automated insiders, and prompt injection is the new social engineering. Implement human-in-the-loop controls for high-privilege actions, apply least-privilege principles to agent tool access, and log every tool invocation for forensic analysis. If your organization is part of the 83% planning to deploy agentic AI but not the 29% ready to operate it securely, you need to slow down and build governance structures first.

Tools and Resources

vLLM v0.14.1 — Latest release includes critical patches for CVE-2026-22778 (video RCE) and CVE-2026-0994 (protobuf). Upgrade immediately if you are running older versions.
NVIDIA Agent Toolkit — Announced March 16, 2026, includes NVIDIA OpenShell open source runtime for building self-evolving agents with enhanced safety and security. Designed for enterprises deploying agentic AI with stricter permission boundaries.
NIST Cyber AI Profile (Preliminary Draft) — NIST’s draft profile for incorporating AI into cybersecurity planning. Public comment period ran through January 30, 2026; initial public draft expected later in 2026.
HiddenLayer 2026 AI Threat Landscape Report — Survey of 250 IT and security leaders revealing that 1 in 8 companies have reported AI breaches linked to agentic systems. Essential reading for CISOs evaluating agentic AI risk.

Key Highlights