Last Week in AI Security — Week of March 2, 2026
OpenAI launches Codex Security agent; Palo Alto warns AI agents are 2026's top insider threat; vLLM RCE and LangChain serialization vulnerabilities disclosed.
Key Highlights
- OpenAI launches Codex Security agent finding 10,561 high-severity vulnerabilities across 1.2M commits
- Palo Alto Networks: AI agents represent new insider threat, with 40% enterprise app integration by 2026
- MITRE ATLAS publishes first 2026 update with Zenity contributions on agentic AI attack techniques
- NIST releases preliminary Cyber AI Profile draft integrating CSF 2.0 with AI-specific security guidance
- Cisco State of AI Security 2026: 83% deploying agentic AI, only 29% operationally secure
Executive Summary
This week marked a critical inflection point in AI security as the gap between deployment velocity and defensive maturity widened to dangerous proportions. While organizations race to integrate agentic AI into production environments, fundamental security controls lag far behind. According to Cisco’s State of AI Security 2026 report, 83% of organizations plan to deploy agentic AI capabilities into business functions, while only 29% report being ready to operate those systems securely.
In the first MITRE ATLAS update of 2026, Zenity researchers contributed substantially to expanding the framework’s coverage of agentic AI threats, introducing new attack techniques and guidance for securing autonomous AI systems. This update comes as Palo Alto Networks Chief Security Intel Officer Wendi Whitmore declared AI agents the new insider threat for 2026, with Gartner estimating that 40% of all enterprise applications will integrate with task-specific AI agents by year-end, up from less than 5% in 2025.
On the vulnerability front, critical flaws continue to expose the fragility of the AI software supply chain. While last week’s vLLM CVE-2026-22778 and PyTorch CVE-2025-32434 were previously covered, new research this week documented widespread prompt injection attacks in production systems. Large language models and AI agents are becoming deeply integrated into web browsers, search engines, and automated content-processing pipelines, introducing a new and largely underexplored attack surface, according to Palo Alto’s Unit 42 team, who published the first in-the-wild observations of web-based indirect prompt injection on March 3.
Top Stories
OpenAI Launches Codex Security Agent for Vulnerability Discovery
OpenAI on Friday began rolling out Codex Security, an artificial intelligence (AI)-powered security agent that’s designed to find, validate, and propose fixes for vulnerabilities. The feature is available in a research preview to ChatGPT Pro, Enterprise, Business, and Edu customers via the Codex web with free usage for the next month.
The launch represents a significant shift in how AI is being applied to application security. OpenAI Codex Security scanned 1.2 million commits and found 10,561 high-severity issues, demonstrating the scale at which AI agents can now analyze codebases for security flaws. The system builds deep context about projects to identify vulnerabilities, validate their exploitability, and propose remediation strategies.
This development comes as organizations struggle with the sheer volume of disclosed vulnerabilities and the limited bandwidth of human security teams. The introduction of autonomous vulnerability discovery agents accelerates both offensive capabilities (finding flaws faster) and defensive responses (proposing fixes automatically), creating a new arms race dynamic in application security.
Palo Alto Networks: AI Agents Emerge as Primary Insider Threat
AI agents represent the new insider threat to companies in 2026, according to Palo Alto Networks Chief Security Intel Officer Wendi Whitmore. The CISO and security teams find themselves under massive pressure to deploy new technology quickly, creating enormous workload to understand if new AI applications are secure enough for their use cases.
The threat stems from what Whitmore calls the “superuser problem.” One of the risks stems from autonomous agents being granted broad permissions, creating a “superuser” that can chain together access to sensitive applications and resources without security teams’ knowledge or approval. By using a single, well-crafted prompt injection or by exploiting a ‘tool misuse’ vulnerability, adversaries now have an autonomous insider at their command, one that can silently execute trades, delete backups, or pivot to exfiltrate the entire customer database.
On one hand, AI agents can help fill the ongoing cyber-skills gap, doing things like correcting buggy code, automating log scans and alert triage, and rapidly blocking security threats. When we look through the defender lens, agentic capabilities allow us to start thinking more strategically about how we defend our networks, versus always being caught in this reactive situation.
The dual nature of AI agents—powerful defenders and potential attack vectors—requires organizations to implement least-privilege principles for autonomous systems just as rigorously as they do for human users.
Prompt Injection Attacks Observed in Production Web Applications
Large language models (LLMs) and AI agents are becoming deeply integrated into web browsers, search engines and automated content-processing pipelines. While these integrations can expand functionality, they also introduce a new and largely underexplored attack surface, according to Unit 42 research published March 3.
Palo Alto Networks documented real-world indirect prompt injection attacks targeting AI-powered content moderation and ad review systems. Attackers use a variety of techniques to embed prompts within webpages, primarily to conceal them from users and evade detection by manual review, signature-based matching and other security checks. Attackers employ diverse techniques to deliver a consistent malicious prompt to maximize their chances of success and bypass security tools.
The International AI Safety Report 2026 found that sophisticated attackers bypass the best-defended models approximately 50% of the time with just 10 attempts. Anthropic’s system card for Claude Opus 4.6 quantified that a single prompt injection attempt against a GUI-based agent succeeds 17.8% of the time without safeguards.
The emergence of prompt injection as a persistent, production-scale threat validates OWASP’s decision to rank it as LLM01 in the 2025 Top 10. Unlike traditional injection attacks, prompt injection exploits a fundamental architectural limitation: LLMs cannot reliably distinguish between trusted instructions and untrusted data.
Framework & Standards Updates
NIST Releases Preliminary Cyber AI Profile Draft
The U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) has released an initial preliminary draft of the Cybersecurity Framework Profile for Artificial Intelligence (Cyber AI Profile or NIST IR 8596). The preliminary draft is designed as a voluntary framework that would extend the recently updated NIST Cybersecurity Framework (CSF) 2.0 to new cybersecurity risks and opportunities introduced by AI and to also complement NIST’s AI Risk Management Framework (AI RMF).
The preliminary draft of the Cyber AI Profile is organized around: Three Focus Areas: Secure (securing AI systems); Defend (conducting AI-enabled cyber defense); and Thwart (thwarting adversarial cyberattacks using AI). Six CSF 2.0 Core Functions: Govern, Identify, Protect, Detect, Respond, and Recover.
Following the 45-day comment period, NIST plans to develop the initial public draft for release in 2026. When finalized, the profile will help organizations incorporate AI into their cybersecurity planning by suggesting key actions to prioritize, highlighting special considerations from specific parts of the CSF when considering AI, and providing mappings to other NIST resources, including the AI Risk Management Framework.
MITRE ATLAS First 2026 Update Focuses on Agentic AI
In the first MITRE ATLAS update of 2026, Zenity researchers contributed substantially to expanding the framework’s coverage of agentic AI threats, reflecting a reality we see across enterprises: AI agents are operational, privileged, and deeply embedded in business workflows. These contributions add clarity and rigor to a threat class that has previously been poorly defined.
AI agents differ from traditional AI models in one critical way: they act. Agents can browse the web, invoke tools, access APIs, read and write data, authenticate to services, and make decisions with limited or no human oversight. This autonomy fundamentally changes the attack surface.
MITRE ATLAS conducted rapid investigations of OpenClaw, analyzing critical incidents reported by the AI security community, mapping associated security threats to ATLAS tactics, techniques, and procedures (TTPs), and identifying corresponding mitigations. OpenClaw is especially unique because it can independently make decisions, take actions, and complete tasks across users’ operational systems and environments without continuous human oversight.
AIUC-1 Consortium Publishes Agentic AI Security Briefing
Enterprise AI deployments have shifted from pilot programs to production systems handling customer data, executing business transactions, and integrating with core infrastructure. That has exposed a significant gap between what AI agents can do and what security teams can observe or control. A briefing published by the AIUC-1 Consortium, developed with input from Stanford’s Trustworthy AI Research Lab and more than 40 security executives, documents the security conditions that emerged in 2025 and projects the risks most likely to affect organizations in 2026.
Sixty-three percent of employees who used AI tools in 2025 pasted sensitive company data, including source code and customer records, into personal chatbot accounts. The average enterprise has an estimated 1,200 unofficial AI applications in use, with 86% of organizations reporting no visibility into their AI data flows. Shadow AI breaches cost an average of $670,000 more than standard security incidents.
Vulnerability Watch
CVE-2025-68664: LangChain Serialization Injection (CVSS: High)
A serialization injection vulnerability exists in LangChain’s dumps() and dumpd() functions. The functions do not escape dictionaries with ‘lc’ keys when serializing free-form dictionaries. The ‘lc’ key is used internally by LangChain to mark serialized objects. When user-controlled data contains this key structure, it is treated as a legitimate LangChain object during deserialization rather than plain user data.
The vulnerability enables multiple attack vectors. Attackers who control serialized data can extract environment variable secrets by injecting {“lc”: 1, “type”: “secret”, “id”: [“ENV_VAR”]} to load environment variables during deserialization (when secrets_from_env=True, which was the old default). They can also instantiate classes with controlled parameters by injecting constructor structures to instantiate any class within trusted namespaces.
Affected components include astream_events(version=“v1”), langchain-community caches, LangChain Hub via hub.pull, and several other features that deserialize untrusted data. The patch introduces breaking changes including allowlist enforcement and secrets_from_env changed from True to False by default.
Mitigation: Update to the latest patched version of LangChain and review all uses of dumps()/loads() with untrusted data. Avoid using astream_events v1 with untrusted inputs.
Note on Previously Covered Vulnerabilities
CVE-2026-22778 (vLLM RCE) and CVE-2025-32434 (PyTorch) were extensively covered in last week’s digest and are not repeated here. Organizations should prioritize patching these critical vulnerabilities if not already completed.
Industry Radar
-
Cisco State of AI Security 2026 Report Released: Cisco’s State of AI Security 2026 report looks at how threats are showing up in real systems. In many cases, attackers are not trying to break the model itself. They target the surrounding components that feed information into the model or allow it to interact with other systems, such as training datasets, model repositories, external tools, and agent frameworks.
-
IBM X-Force Threat Intelligence Index 2026: IBM released the 2026 X-Force Threat Intelligence Index, revealing that cybercriminals are exploiting basic security gaps at dramatically higher rates, now accelerated by AI tools that help attackers identify weaknesses faster than ever. IBM X‑Force observed a 44% increase in attacks that began with the exploitation of public-facing applications, largely driven by missing authentication controls and AI-enabled vulnerability discovery. Released February 25 (prior week).
-
Google TensorFlow 2.21 and LiteRT Released: Google is shifting its TensorFlow Core resources to focus heavily on long-term stability. The development team will now exclusively focus on: Security and bug fixes, dependency updates, and community contributions. TensorFlow enters maintenance mode with a focus on stability rather than new features.
Policy Corner
No significant AI security policy developments occurred during the March 2-8, 2026 period that were not already covered in previous weeks.
Research Spotlight
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Our experiments evaluated over 1,400 adversarial prompts across four LLMs: GPT-4, Claude 2, Mistral 7B, and Vicuna. We analyze results along several dimensions, including model susceptibility, attack technique efficacy, prompt behavior patterns, and cross-model generalization. Among the tested models, GPT-4 demonstrated the highest vulnerability with an ASR of 87.2%, confirming its powerful but permissive instruction-following nature. Published May 2025, arXiv.
Analysis of LLMs Against Prompt Injection and Jailbreak Attacks
This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset across multiple open-source LLMs, including Phi, Mistral, DeepSeek-R1, Llama 3.2, Qwen, and Gemma variants. We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms. Furthermore, we evaluated several lightweight, inference-time defence mechanisms that operate as filters without any retraining or GPU-intensive fine-tuning. Although these defences mitigate straightforward attacks, they are consistently bypassed by long, reasoning-heavy prompts. arXiv preprint, February 2026.
Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review
Large language models (LLMs) have rapidly transformed artificial intelligence applications across industries, yet their integration into production systems has unveiled critical security vulnerabilities, chief among them prompt injection attacks. This comprehensive review synthesizes research from 2023 to 2025, analyzing 45 key sources, industry security reports, and documented real-world exploits. We examine the taxonomy of prompt injection techniques, including direct jailbreaking and indirect injection through external content. The rise of AI agent systems and the Model Context Protocol (MCP) has dramatically expanded attack surfaces, introducing vulnerabilities such as tool poisoning and credential theft. We document critical incidents including GitHub Copilot’s CVE-2025-53773 remote code execution vulnerability (CVSS 9.6) and ChatGPT’s Windows license key exposure. MDPI Information journal, January 2026.
Adversarial Machine Learning: A Review of Methods, Tools, and Critical Industry Sectors
This paper surveys the Adversarial Machine Learning (AML) landscape in modern AI systems, while focusing on the dual aspects of robustness and privacy. Initially, we explore adversarial attacks and defenses using comprehensive taxonomies. Subsequently, we investigate robustness benchmarks alongside open-source AML technologies and software tools that ML system stakeholders can use to develop robust AI systems. Artificial Intelligence Review, Springer, May 2025.
What This Means For You
If you’re deploying AI agents this quarter, pause and implement least-privilege controls now. The pattern is unmistakable: organizations are granting AI agents broad permissions to “get things done,” creating superuser-equivalent access without corresponding audit trails or access controls. Before your next agent deployment, map every API it can call, every data source it can access, and every action it can take. Then restrict each to the absolute minimum required. It becomes equally as important for us to make sure that we are only deploying the least amount of privileges needed to get a job done, just like we would do for humans.
Treat prompt injection as a production threat, not a research curiosity. The Unit 42 findings this week confirm what researchers have been warning: prompt injection attacks are happening in the wild, targeting real applications. If your LLM-powered system processes any external content—emails, web pages, uploaded documents, API responses—assume attackers will attempt to inject instructions. Implement input sanitization, output validation, and consider architectural controls that separate instructions from data. Prompt injection ranks #1 on the OWASP Top 10 for LLM Applications 2025 because it exploits a fundamental architectural weakness: LLMs cannot reliably distinguish between trusted instructions and untrusted data. The International AI Safety Report 2026 found that sophisticated attackers bypass the best-defended models approximately 50% of the time with just 10 attempts.
Use the NIST Cyber AI Profile to audit your AI security posture. Even in preliminary draft form, the NIST framework provides a structured way to identify gaps in how you secure AI systems, use AI for defense, and protect against AI-enabled attacks. Map your current AI deployments to the three focus areas (Secure, Defend, Thwart) and identify where you lack visibility, controls, or incident response capabilities. The 45-day comment period is your opportunity to shape the framework before it becomes the de facto standard.
Tools and Resources
-
MITRE ATLAS — Updated framework for adversarial threat landscape in AI systems, with new coverage of agentic AI attack techniques contributed by Zenity researchers in the first 2026 update.
-
NIST Cyber AI Profile (Preliminary Draft) — Initial preliminary draft extending NIST CSF 2.0 with AI-specific security considerations across Secure, Defend, and Thwart focus areas.
-
OpenAI Codex Security — AI-powered security agent for vulnerability discovery, validation, and remediation, available in research preview for ChatGPT Pro, Enterprise, Business, and Edu customers.
-
AIUC-1 Consortium Briefing — Security guidance for agentic AI deployments developed with Stanford’s Trustworthy AI Research Lab and 40+ security executives, documenting enterprise AI security gaps identified in 2025.