Last Week in AI Security — Week of March 9, 2026

Executive Summary

This week witnessed a collision between AI security engineering and geopolitical power dynamics, as the U.S. Department of Defense’s blacklisting of Anthropic over ethical red lines sent shockwaves through the industry. While Anthropic refused to permit its models to be used for domestic mass surveillance or fully autonomous lethal targeting, the Pentagon responded by designating the company a “supply chain risk”—a label historically reserved for foreign adversaries. The ensuing legal battle and cross-industry employee backlash, including an amicus brief signed by over 30 Google and OpenAI staff, underscores a widening fracture between AI safety culture and national security imperatives.

On the technical security front, OpenAI announced the acquisition of Promptfoo on March 9, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development. The Promptfoo team has built a powerful suite of tools trusted by over 25 percent of Fortune 500 companies, along with a widely used open-source CLI and library for evaluating and red-teaming LLM applications. This acquisition signals a strategic shift toward embedding security evaluation directly into the AI development lifecycle, as enterprises grapple with the expanding attack surface introduced by agentic AI.

Meanwhile, Google published patches for CVE-2026-0628, a high-severity vulnerability in Chrome’s Gemini AI panel that allowed malicious extensions to inject code and access cameras and microphones; researchers showed attackers could also take screenshots, access local files, and launch phishing content inside the panel. This incident highlights the risks of deeply integrating AI into browser environments, where traditional isolation models break down. On the regulatory front, the European Commission published a draft implementing regulation on March 12, 2026, detailing how it will evaluate general-purpose AI models and impose fines under the EU AI Act, opening a four-week public feedback window running until April 9, 2026.

Top Stories

OpenAI Acquires Promptfoo to Embed Security Testing in AI Development

OpenAI announced on March 9 the acquisition of Promptfoo, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development. Once the acquisition is finalized, Promptfoo’s technology will be integrated directly into OpenAI Frontier, the platform for building and operating AI coworkers.

The deal represents a significant evolution in how AI security is approached. As enterprises deploy AI coworkers into real workflows, evaluation, security, and compliance become foundational requirements; enterprises need systematic ways to test agent behavior, detect risks before deployment, and maintain clear records to support oversight, governance, and accountability over time. Automated security testing and red-teaming capabilities will become a native part of the Frontier platform, helping enterprises identify and remediate risks like prompt injections, jailbreaks, data leaks, tool misuse, and out-of-policy agent behaviors.

The acquisition comes as organizations struggle with the operational reality that traditional security tooling was never designed for agentic systems. Promptfoo’s open-source lineage—widely used by developers for adversarial testing—positions it as a rare example of a commercially viable security tool with grassroots credibility in the AI research community. By integrating Promptfoo’s red-teaming capabilities at the platform level, OpenAI is signaling that security cannot remain a post-deployment afterthought for AI systems granted autonomy and tool access.

Pentagon Blacklists Anthropic Over AI Ethics Red Lines, Sparking Industry Uproar

The Pentagon labeled Anthropic a supply-chain risk—usually reserved for foreign adversaries—after the AI firm refused to allow the Department of Defense to use its technology for mass surveillance of Americans or autonomously firing weapons; the DOD argued it should be able to use AI for any “lawful” purpose and not be constrained by a private contractor. Anthropic filed two lawsuits on Monday contesting the government’s authority to label it a “supply chain risk to national security,” an unprecedented designation previously reserved for foreign adversaries, following orders from President Trump for all federal agencies to “immediately cease” all use of Anthropic’s technology.

The fallout was immediate and unprecedented. More than 30 OpenAI and Google DeepMind employees filed a statement Monday supporting Anthropic’s lawsuit against the U.S. Defense Department, with signatories including Google DeepMind chief scientist Jeff Dean; the brief reads “The government’s designation of Anthropic as a supply chain risk was an improper and arbitrary use of power that has serious ramifications for our industry”. More than 875 employees across Google and OpenAI signed an open letter backing Anthropic’s stance, stating “They’re trying to divide each company with fear that the other will give in; that strategy only works if none of us know where the others stand”.

The timing of OpenAI’s competing Pentagon deal added fuel to the controversy. Within moments of designating Anthropic a supply-chain risk, the DOD signed a deal with OpenAI—a move many of the ChatGPT maker’s employees protested. OpenAI signed a deal with the Pentagon on Friday evening, just a few hours after Anthropic was blacklisted; OpenAI CEO Sam Altman said the agreement preserved the same principles Anthropic had been blacklisted for defending, with the difference being the enforcement mechanism: instead of hard contractual prohibitions, OpenAI accepted the “all lawful purposes” framework but layered on architectural controls including cloud-only deployment, a proprietary safety stack the Pentagon agreed not to override, and cleared engineers embedded forward.

The case exposes a fundamental tension: whether private AI companies can impose ethical constraints on government use, or whether national security demands override corporate values. Anthropic’s legal challenge will likely set precedent for the limits of AI vendor discretion in dual-use contexts. For security practitioners, the incident highlights that AI risk management increasingly operates at the intersection of technical controls, contractual guardrails, and geopolitical pressure—domains that require coordination across engineering, legal, and policy teams.

Chrome Gemini AI Panel Vulnerability Exposes Risks of Browser-Integrated AI

Google published patches for CVE-2026-0628, a high-severity vulnerability in Chrome’s Gemini AI panel that allowed malicious extensions to inject code and access cameras and microphones; researchers showed attackers could also take screenshots, access local files, and launch phishing content inside the panel. The vulnerability underscores the security challenges introduced when AI systems are deeply embedded into browser environments with privileged access to user context.

Browser-based AI agents represent a particularly challenging attack surface. Unlike traditional web applications confined to sandboxed iframes, AI panels often require elevated permissions to read page content, access local files, and integrate with system APIs—creating a rich target for malicious extensions or compromised third-party integrations. The CVE-2026-0628 exploit demonstrated that once an attacker gains code execution within the AI panel context, traditional browser isolation breaks down.

Security teams should treat browser-integrated AI as a first-class attack vector requiring dedicated threat modeling. Key mitigations include: enforcing strict content security policies for AI panel rendering contexts, limiting API access based on least-privilege principles, implementing robust input validation on any data passed to the AI backend, and monitoring for anomalous extension behavior that attempts to interact with AI components. As more browsers integrate conversational AI directly into the user interface, the historical separation between “web content” and “browser chrome” will continue to erode, demanding new security primitives.

Framework & Standards Updates

The European Commission published a draft implementing regulation (Ares(2026)2709234) on March 12, 2026, setting out detailed procedural arrangements for how the Commission will evaluate general-purpose AI models under Article 92 of the EU AI Act and conduct enforcement proceedings, including fines, under Article 101; the draft covers access rights to model weights and source code, independent expert selection, conflict-of-interest rules, interim measures, the right to be heard, access to the Commission’s file, and five-year limitation periods for fines. The document opened a four-week public feedback window running until April 9, 2026, with Commission adoption planned for the second quarter of 2026.

This procedural regulation is significant because it translates the EU AI Act’s high-level enforcement provisions into concrete investigative steps. Organizations deploying general-purpose models in the EU market should review the draft to understand what technical artifacts (model weights, training data provenance, evaluation logs) may be subject to regulatory inspection. The regulation’s provisions on independent expert access suggest that model providers will need to maintain not only compliance documentation but also technical interfaces that allow external auditors to reproduce safety and capability evaluations.

On March 13, 2026, the Council agreed its position on the proposal to streamline certain rules regarding artificial intelligence, forming part of the “Omnibus VII” legislative package in the EU’s simplification agenda; the package includes proposals for two regulations aiming to simplify the EU’s digital legislative framework and the implementation of harmonised rules on AI. The text introduces a fixed timeline for the delayed application of high-risk rules: the new application dates would be 2 December 2027 for stand-alone high-risk AI systems and 2 August 2028 for high-risk AI systems embedded in products.

Vulnerability Watch

CVE-2026-0628 — Chrome Gemini AI Panel Code Injection (High severity)
Google patched CVE-2026-0628, a high-severity vulnerability in Chrome’s Gemini AI panel that allowed malicious extensions to inject code and access cameras and microphones; researchers showed attackers could also take screenshots, access local files, and launch phishing content inside the panel. Mitigation: Update to the latest Chrome stable release. Organizations deploying managed Chrome instances should review extension policies and restrict installation of untrusted extensions.

CVE-2026-1492 — WordPress User Registration Plugin Privilege Escalation (Critical, CVSS 9.8)
A patch was released for CVE-2026-1492, a critical (9.8 CVSS) privilege escalation flaw in the User Registration & Membership WordPress plugin; the vulnerability lets unauthenticated attackers create administrator accounts and take over sites. Mitigation: Update the plugin immediately to version 3.4.8 or 4.12.2.

CVE-2026-22719 — VMware Aria Operations Command Injection (High severity)
VMware patched CVE-2026-22719, a high-severity command injection flaw in Aria Operations, its cloud management platform; the vulnerability allows unauthenticated remote code execution during support-assisted migrations and affects versions 8 through 8.18.5 and 9 through 9.0.1, with patches and a workaround script available. Mitigation: Apply vendor patches or deploy the workaround script for affected versions.

Note: CVE-2025-32434 (PyTorch deserialization RCE) was covered in last week’s digest and will not be repeated here, though additional commentary on the vulnerability appeared this week.

Research Spotlight

Adversarial Evaluation of Multimodal LLMs on Typographic Attacks — Cloud Security Alliance research published March 8, 2026, demonstrates that under stealth constraints designed to conceal instructions from casual human inspection, typographic injection achieved a peak attack success rate of 64% in black-box settings against GPT-4V, Claude 3, Gemini, and LLaVA; the work introduced automated techniques for adaptive font scaling and background-aware rendering.

Autonomous Jailbreak Agents Achieve Near-Perfect Success Rates — Nature Communications research published March 2026 shows autonomous jailbreak agents achieve a 97.14% success rate; Claude 4 Sonnet showed a 2.86% harm score compared to GPT-4o (61.43%), Gemini 2.5 Flash (71.43%), and DeepSeek-V3 (90%), with the gap attributed to training on adversarial evaluation datasets like StrongREJECT.

Comprehensive Red-Teaming Study Across Four Leading LLMs — Evaluation of over 1,400 adversarial prompts across GPT-4, Claude 2, Mistral 7B, and Vicuna, analyzing results along model susceptibility, attack technique efficacy, and cross-model generalization. GPT-4 demonstrated the highest vulnerability with an attack success rate of 87.2%; prompt injections exploiting roleplay dynamics achieved the highest ASR (89.6%).

For broader context on adversarial ML research trends, see the NDSS Symposium 2026 Program, which features multiple sessions on AI security including evasion attacks, LLM safety alignment, and side-channel leakage in confidential VMs.

What This Means For You

Prepare for regulatory divergence on AI use restrictions. The Anthropic-Pentagon conflict demonstrates that AI governance is no longer purely a compliance exercise—it now involves navigating geopolitical red lines and contractual constraints that may conflict with government demands. Security and legal teams should collaboratively define internal “red lines” for AI deployment in sensitive contexts (autonomous decision-making, mass surveillance, lethal systems) and document those boundaries in vendor agreements and acceptable use policies. Organizations with government contracts should proactively clarify permitted and prohibited use cases before deployment, rather than assuming “all lawful use” clauses will be interpreted favorably.

Treat browser-integrated AI as a privileged attack surface. The CVE-2026-0628 Chrome Gemini vulnerability highlights that AI features embedded in browsers inherit the browser’s elevated privileges while introducing new attack vectors through malicious extensions and third-party integrations. Security teams should inventory all browser-based AI functionality, enforce strict extension policies, and implement runtime monitoring for anomalous interactions between extensions and AI components. Consider deploying browser isolation solutions for high-risk users and contexts where AI-enabled phishing or data exfiltration poses significant risk.

Implement defense-in-depth for prompt injection. With autonomous jailbreak agents achieving 97.14% success rates in recent research and no architectural fix on the horizon, organizations must layer multiple mitigations: input sanitization to strip or escape control characters and known injection patterns; output validation to detect and block anomalous completions; privilege separation between system instructions and user input using delimiters and structured prompts; runtime monitoring for behavioral anomalies like unexpected tool invocations or data access patterns; and rate limiting and abuse detection to slow down automated attack campaigns. No single control will suffice—only a defense-in-depth strategy that assumes evasion will slow determined attackers.

Tools and Resources

Promptfoo — Open-source CLI and library for LLM red-teaming and evaluation, now being integrated into OpenAI Frontier. Supports automated security testing for prompt injection, jailbreaks, and policy violations.
EU AI Act Compliance Checker — Interactive tool developed by the Future of Life Institute to help SMEs and startups understand whether they have legal obligations under the EU AI Act. Note: This is a work-in-progress simplification; consult legal counsel for binding advice.
MITRE ATLAS — Knowledge base of adversary tactics and techniques against AI systems, continuously updated with new case studies. MITRE ATLAS maps 14 tactics and 66 techniques to defend AI systems from threats like data poisoning and model theft; ATLAS added 14 new techniques in 2025 for AI agents, covering risks like prompt injection and memory manipulation attacks.
OWASP Top 10 for LLM Applications 2025 — Updated community-driven list of the most critical security risks in LLM deployments. Prompt Injection (LLM01:2025) remains the #1 vulnerability.

Key Highlights