Last Week in AI Security — Week of April 6, 2026

Executive Summary

The week of April 6, 2026 marked an inflection point in AI security as Anthropic’s Claude Mythos Preview demonstrated that frontier AI models have found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. The convergence of three critical developments defines the current threat landscape: the operational deployment of AI models capable of autonomous vulnerability discovery at scale, the weaponization of on-device AI systems through prompt injection techniques, and the active exploitation of vulnerabilities in AI infrastructure frameworks that protect millions of enterprise deployments.

Anthropic announced it would withhold its latest model, Mythos Preview, from the public, citing unprecedented vulnerability-discovery capabilities that could cause significant damage in the wrong hands, instead sharing the model with a limited group of tech giants and partners to help shore up their defenses. In the wake of Anthropic’s announcement about Mythos Preview, Treasury Secretary Scott Bessent convened a meeting with major financial institutions this week, underscoring the gravity with which government and industry are responding to AI-assisted offensive capabilities.

Meanwhile, researchers demonstrated that Apple Intelligence’s on-device AI can be manipulated by attackers using prompt injection techniques, with RSAC Research unveiling a method achieving a 76% success rate in 100 tests by employing adversarial prompts and Unicode obfuscation, findings shared with Apple on October 15, 2025. This attack represents a fundamental challenge to the assumption that on-device AI offers inherently superior security compared to cloud-based alternatives.

Simultaneously, cybersecurity researchers disclosed three security vulnerabilities impacting LangChain and LangGraph that expose filesystem data, environment secrets, and conversation history, with each vulnerability exposing a different class of enterprise data. These developments collectively illustrate that AI security has entered a phase where both AI capabilities and AI infrastructure are under active attack, requiring immediate architectural response across the industry.

Top Stories

Anthropic Project Glasswing: Mythos Preview Finds Thousands of Zero-Days in Critical Infrastructure

Anthropic is committing up to $100M in usage credits for Mythos Preview across defensive efforts, as well as $4M in direct donations to open-source security organizations, as part of Project Glasswing. The initiative represents an urgent attempt to leverage frontier AI’s unprecedented vulnerability discovery capabilities for defensive purposes before these capabilities proliferate to adversaries.

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. The powerful cyber capabilities of Claude Mythos Preview are a result of its strong agentic coding and reasoning skills, with the model achieving the highest scores of any model yet developed on a variety of software coding tasks.

As part of Project Glasswing, launch partners will use Mythos Preview as part of their defensive security work, and Anthropic has extended access to a group of over 40 additional organizations that build or maintain critical software infrastructure so they can use the model to scan and secure both first-party and open-source systems.

The timing carries significant implications. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely, with the fallout for economies, public safety, and national security potentially severe. “We should be planning for a world where, within six months to 12 months, capabilities like this could be broadly distributed or made broadly available,” Graham told NBC News, adding “If you step back, that’s a pretty crazy time frame, where usually preparations for things like this take many years.”

Security experts warn that Mythos is not simply good at finding vulnerabilities but also at chaining them together into complicated exploits that can be devastating hacking tools. The model’s capabilities represent a fundamental shift from manual vulnerability discovery to AI-assisted offensive operations at unprecedented scale and speed.

Apple Intelligence Jailbroken via Prompt Injection at 76% Success Rate

RSAC researchers tested their attack with 100 random prompts and succeeded 76% of the time, with RSAC estimating that between 100,000 and 1 million Apple customers were already using apps vulnerable to the attack when the vulnerability was discovered.

To hack Apple Intelligence, RSAC had to solve two problems: find an input that causes the local LLM to execute an adversarially-chosen task and bypass the filters; RSAC researcher Dario Pasquini and colleagues had previously discovered a “Neural Exec,” an adversarial input designed to trick an LLM into performing an arbitrary task.

The attack’s technical foundation rests on Unicode manipulation combined with adversarial prompt engineering. As of this writing, Apple doesn’t provide many details, but based on research, RSAC thinks that Apple forces all inputs to the local model to go through input and output filters designed to eliminate malicious input and prevent the LLM from returning undesirable output. The RSAC team successfully bypassed these protections using Unicode’s Right-to-left-Override function to obfuscate adversarial payloads.

The RSAC Research Lab disclosed this attack to Apple on October 15, 2025, through the Apple Security Research portal; Apple has since hardened the affected systems, and protections were rolled out in iOS 26.4 and macOS 26.4, with RSAC noting no evidence of this vulnerability being exploited by attackers in the wild.

Researchers showed the system can be pushed into generating offensive or unintended responses, and the risk goes well beyond text output, as Apple Intelligence connects directly to apps through system APIs, so manipulated responses could affect how apps behave or expose sensitive data, with RSAC estimating that between 100,000 and 1 million users may already be using apps with potential exposure.

The findings challenge fundamental assumptions about on-device AI security. While Apple’s hybrid architecture — with smaller models running locally and complex processing through Private Cloud Compute — was framed as a privacy-focused alternative to cloud-based systems, the research demonstrates that local deployment does not automatically confer security advantages against adversarial inputs.

LangChain and vLLM Vulnerabilities Expose Enterprise AI Infrastructure

Three critical vulnerabilities in LangChain and LangGraph emerged as active threats to enterprise AI deployments. CVE-2026-34070 (CVSS score: 7.5) is a path traversal vulnerability in LangChain allowing access to arbitrary files via its prompt-loading API; CVE-2025-68664 (CVSS score: 9.3) is a deserialization of untrusted data vulnerability leaking API keys and environment secrets; CVE-2025-67644 (CVSS score: 7.3) is an SQL injection vulnerability in LangGraph SQLite checkpoint implementation allowing arbitrary SQL queries.

LangChain-Core and LangGraph have been downloaded more than 23 million and 9 million times last week alone, with each vulnerability exposing a different class of enterprise data: filesystem files, environment secrets, and conversation history.

The serialization vulnerability represents a particularly insidious attack vector. Attackers who control serialized data can extract environment variable secrets by injecting {“lc”: 1, “type”: “secret”, “id”: [“ENV_VAR”]} to load environment variables during deserialization when secrets_from_env=True, which was the old default.

The Langflow vulnerability, tracked as CVE-2026-33017 (CVSS 9.3), allows unauthenticated remote code execution through the /api/v1/build_public_tmp/{flow_id}/flow endpoint, with the flaw going from advisory to weaponized exploitation in 20 hours. CVE-2026-33017 went from advisory to weaponized exploitation in 20 hours with no public proof-of-concept required, as the advisory description contained enough detail for attackers to build working exploits.

Concurrently, vLLM faced multiple critical security advisories. GitHub security advisories show hardcoded trust_remote_code=True in NemotronVL and KimiK25 bypasses user security opt-out (GHSA-7972-pg2x-xr59, High severity), and RCE via auto_map dynamic module loading during model initialization (GHSA-2pc9-4j83-qjmr, Critical severity) published February 2, 2026.

The vulnerability breaks the documented trust_remote_code safety boundary in a core model-loading utility, with the vulnerable code living in a common loading path, meaning any application, service, CI job, or developer machine that uses vllm’s transformer utilities to load configs can be affected, and a successful exploit can execute arbitrary commands on the host.

Framework & Standards Updates

NIST AI RMF Profile for Critical Infrastructure Released

On April 7, 2026, NIST released a concept note for an AI RMF Profile on Trustworthy AI in Critical Infrastructure, which will guide critical infrastructure operators towards specific risk management practices to consider when engaging AI-enabled capabilities.

The release follows December 2025’s preliminary draft of the Cybersecurity Framework Profile for Artificial Intelligence. NIST scheduled a workshop for January 14, 2026 to discuss the Preliminary Draft of the Cyber AI Profile, with the profile open for comments until January 30, 2026.

MITRE ATLAS Expands with Agentic AI Coverage

The rapid expansion from 15 tactics in October 2025 to 16 tactics and 84 techniques by February 2026 demonstrates ATLAS’s commitment to keeping pace with AI evolution; as of version 5.1.0 (November 2025), the framework contains 16 tactics, 84 techniques, 56 sub-techniques, 32 mitigations, and 42 real-world case studies, with the February 2026 update (v5.4.0) adding further agent-focused techniques.

In the first MITRE ATLAS update of 2026, Zenity researchers contributed substantially to expanding the framework’s coverage of agentic AI threats, adding clarity and rigor to a threat class that has previously been poorly defined.

The January 2026 ATLAS update (v5.3.0) added three new case studies specifically covering MCP server compromises, indirect prompt injection via MCP channels, and malicious AI agent deployment, with security teams advised to validate all MCP server configurations, restrict tool permissions to least privilege, and monitor tool invocation patterns for anomalies.

OWASP Releases Top 10 for Agentic Applications 2026

The 2026 OWASP Top 10 for Agentic Applications addresses the emerging security challenges of autonomous AI systems; unlike traditional LLM applications, agentic systems combine reasoning, memory, tools, and multi-step execution, introducing new classes of vulnerabilities, with the 2026 edition focusing on failures arising from goal misalignment, tool misuse, delegated trust, inter-agent communication, persistent memory, and emergent autonomous behavior.

Prompt injection is ranked #1 in the OWASP Top 10 for LLM Applications 2025, showing it remains the leading AI app security risk entering 2026.

CIS Report: Prompt Injection “Inherent Threat” to Generative AI

The Center for Internet Security released a report, Prompt Injections: The Inherent Threat to Generative AI, explaining how cyber threat actors can manipulate AI systems by hiding malicious instructions in documents, emails, websites, and other data that AI tools are allowed to access, potentially leading to stolen sensitive data, unauthorized system access, and disrupted operations.

A CIS report, Prompt Injections: The Inherent Threat to Generative AI, identifies prompt injection as a persistent concern tied to adoption, with a 2025 NASCIO survey of 51 state and territorial CIOs finding that 82% reported employees using GenAI in daily work, up from 53% the prior year.

Vulnerability Watch

CVE-2026-34070: LangChain Path Traversal (CVSS 7.5)

A path traversal vulnerability in LangChain (“langchain_core/prompts/loading.py”) allows access to arbitrary files without any validation via its prompt-loading API by supplying a specially crafted prompt template.

Mitigation: CVE-2026-34070 requires langchain-core version 1.2.22 or later.

CVE-2025-68664: LangChain Serialization Injection (CVSS 9.3)

A deserialization of untrusted data vulnerability in LangChain leaks API keys and environment secrets by passing as input a data structure that tricks the application into interpreting it as an already serialized LangChain object rather than regular user data.

The core vulnerability was in dumps() and dumpd(): these functions failed to escape user-controlled dictionaries containing ‘lc’ keys; when this unescaped data was later deserialized via load() or loads(), the injected structures were treated as legitimate LangChain objects rather than plain user data.

Mitigation: CVE-2025-68664 requires versions 0.3.81 or 1.2.5 and above.

CVE-2025-67644: LangGraph SQL Injection (CVSS 7.3)

An SQL injection vulnerability in LangGraph SQLite checkpoint implementation allows an attacker to manipulate SQL queries through metadata filter keys and run arbitrary SQL queries against the database.

Mitigation: CVE-2025-67644 requires langgraph-checkpoint-sqlite version 3.0.1.

CVE-2026-33017: Langflow RCE (CVSS 9.3)

Tracked as CVE-2026-33017 (CVSS 9.3), the vulnerability allows unauthenticated remote code execution through the /api/v1/build_public_tmp/{flow_id}/flow endpoint. The vulnerability has come under active exploitation within 20 hours of public disclosure.

vLLM Multiple Critical Advisories

Critical advisories include RCE via auto_map dynamic module loading during model initialization (GHSA-2pc9-4j83-qjmr, published Feb 2, 2026), vLLM RCE In Video Processing (GHSA-4r2x-xpjr-7cvv, published Jan 27, 2026), and SSRF Protection Bypass in vLLM (GHSA-v359-jj2v-j536, published Mar 26, 2026).

CVE-2026-34621: Adobe Acrobat Reader Active Exploitation (CVSS 8.6)

Adobe released emergency updates to fix a critical security flaw in Acrobat Reader that has come under active exploitation in the wild; the vulnerability, assigned the CVE identifier CVE-2026-34621, carries a CVSS score of 8.6 out of 10.0, with successful exploitation allowing an attacker to run malicious code on affected installations.

Attack Research

Prompt Injection Dominates 2026 Threat Landscape

Prompt injection ranks #1 on the OWASP Top 10 for LLM Applications 2025 because it exploits a fundamental architectural weakness: LLMs cannot reliably distinguish between trusted instructions and untrusted data, with the International AI Safety Report 2026 finding that sophisticated attackers bypass the best-defended models approximately 50% of the time with just 10 attempts.

The UK’s National Cyber Security Centre issued a formal assessment in December 2025 warning that prompt injection may never be fully mitigated the way SQL injection was, characterising LLMs as “inherently confusable deputies,” with Bruce Schneier and Barath Raghavan reinforcing in IEEE Spectrum in January 2026 that prompt injection is unlikely to ever be fully solved with current LLM architectures.

New research demonstrates the persistence of the threat. A paper published April 4, 2026 in Scientific Reports (DOI: https://doi.org/10.1038/s41598-026-43883-0) examines detection and analysis of prompt injection in Indian multilingual large language models.

An IEEE Symposium on Security and Privacy 2026 paper presents the first large-scale study of 17 third-party chatbot plugins used by over 10,000 public websites, uncovering previously unknown prompt injection risks in practice, with 8 of these plugins (used by 8,000 websites) failing to enforce the integrity of the conversation history transmitted in network requests.

Large Reasoning Models Enable Autonomous Jailbreaking

A Nature Communications paper (February 5, 2026) shows that the persuasive capabilities of large reasoning models simplify and scale jailbreaking; researchers evaluated four LRMs (DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini, Qwen3 235B) to act as autonomous adversaries conducting multi-turn conversations with nine widely used target models, with LRMs receiving instructions via a system prompt before proceeding to planning and executing jailbreaks with no further supervision, yielding an overall jailbreak success rate across all model combinations of 97.14%.

Advanced automated attacks routinely achieve 90-99% success on open-weight models, while black-box attacks reach 80-94% effectiveness on proprietary models, with agent-driven multi-turn attacks demonstrating 95% success by decomposing harmful queries across conversation turns.

Industry Radar

Anthropic Raises AI Security Investment Bar

Anthropic is committing up to $100M in usage credits for Mythos Preview across defensive efforts, as well as $4M in direct donations to open-source security organizations. The investment represents the largest single commitment to defensive AI security to date.

Treasury Convenes Financial Sector Meeting on AI Threats

In the wake of Anthropic’s announcement about Mythos Preview, Treasury Secretary Scott Bessent convened a meeting with major financial institutions this week, marking the first Cabinet-level response to AI-driven vulnerability discovery capabilities.

Policy Corner

Government AI Adoption Accelerates Despite Security Concerns

A 2025 NASCIO survey of 51 state and territorial CIOs found that 82% reported employees using GenAI in daily work, up from 53% the prior year, with most organizations having moved beyond early testing with widespread pilot programs, proofs of concept, and employee training already in place; AI, GenAI, and agentic AI ranked as the number one policy and technology priority for 2026.

EU AI Act Obligations Active

EU AI Act GPAI (General Purpose AI) obligations became active in August 2025, requiring adversarial testing for systemic-risk AI systems and cybersecurity protection against unauthorized access.

Research Spotlight

Detection and Analysis of Prompt Injection in Indian Multilingual Large Language Models

Srinivasan et al., Scientific Reports, April 4, 2026

The paper addresses how LLMs can be made safe and robust with a focus on prompt injections, noting that most available literature and defense systems are focused on English language prompts, with no publicly available datasets or defense mechanisms for detecting prompt injection in Indian regional or code-mixed languages.

When AI Meets the Web: Prompt Injection Risks in Third-Party AI Chatbot Plugins

IEEE Symposium on Security and Privacy 2026

The paper presents the first large-scale study of prompt injection attacks in simpler LLM applications like customer service chatbots, which are widespread on the web yet their security posture remains poorly understood, examining 17 third-party chatbot plugins used by over 10,000 public websites that act as intermediaries to commercial LLM APIs.

Large Reasoning Models are Autonomous Jailbreak Agents

Nature Communications, February 5, 2026

The study shows that jailbreaking has traditionally required complex technical procedures or specialized expertise, but the persuasive capabilities of large reasoning models simplify and scale jailbreaking, converting it into an inexpensive activity accessible to non-experts, with experiments yielding an overall jailbreak success rate of 97.14%.

Jailbreaking LLMs: A Survey of Attacks, Defenses and Evaluation

TechRxiv, January 6, 2026

The survey provides the first unified systematization of the LLM security threat landscape (2022-2025), finding a persistent asymmetry between attack sophistication and defensive capability, with advanced automated attacks routinely achieving 90-99% success on open-weight models while black-box attacks reach 80-94% effectiveness on proprietary models.

AB Jailbreaking: A Novel Hybrid Framework for Exploitation of Adversarial Vulnerabilities in LLMs

Scientific Reports, April 2026

The paper proposes AB-JB, a three-stage hybrid jailbreak framework that combines black-box semantic adversarial prompt variant generation with a compact, regularised embedding-level suffix optimiser, using an attacker LLM to produce multiple semantically diverse adversarial variants and a judge LLM to score and filter variants into a high-quality candidate pool.

What This Means For You

The convergence of AI-discovered vulnerabilities, successful attacks on on-device AI, and exploitation of AI infrastructure frameworks signals that AI security has moved from theoretical concern to operational crisis. Organizations must act immediately across three critical dimensions.

First, assume AI-assisted attackers are already operational. Given the rate of AI progress, it will not be long before capabilities like Mythos Preview proliferate, potentially beyond actors who are committed to deploying them safely. This six-to-twelve-month window means your vulnerability management cadence must accelerate dramatically. Traditional quarterly patching cycles are incompatible with AI-discovered zero-days and 20-hour exploit development timelines.

Second, treat AI infrastructure as critical infrastructure. The LangChain and vLLM vulnerabilities demonstrate that AI frameworks are not development tools—they are production systems carrying the same risk profile as databases and authentication systems. If you’re running LangChain, LangGraph, vLLM, or similar frameworks in production: update immediately to patched versions, audit all workflows that pass untrusted data through serialization layers, implement runtime monitoring of tool calls and outputs, and validate that trust_remote_code parameters are explicitly set rather than relying on defaults.

Third, test your AI deployments with adversarial rigor. The Apple Intelligence research shows that even sophisticated on-device safety measures can be bypassed with 76% success rates using techniques accessible to motivated attackers. Security teams must conduct multi-turn red team assessments that capture tool calls and tool outputs, test indirect injection through RAG pipelines and document ingestion as well as direct user attacks, and validate that refusals are enforced at the action layer, not just the text response layer. The “refusal-enablement gap”—where models refuse in natural language while still providing attack steps—represents a particularly dangerous failure mode that text-level monitoring will miss entirely.

Tools and Resources

MITRE ATLAS v5.4.0 — The framework now contains 16 tactics, 84 techniques, 56 sub-techniques, 32 mitigations, and 42 real-world case studies, with the February 2026 v5.4.0 update adding techniques including “Publish Poisoned AI Agent Tool” and “Escape to Host.” Available at atlas.mitre.org

OWASP Top 10 for Agentic Applications 2026 — The 2026 edition focuses on failures arising from goal misalignment, tool misuse, delegated trust, inter-agent communication, persistent memory, and emergent autonomous behavior. Documentation at genai.owasp.org

CIS Prompt Injection Report — The report explains how cyber threat actors can manipulate AI systems by hiding malicious instructions in documents, emails, websites, and other data that AI tools are allowed to access. Available at cisecurity.org

NIST AI RMF Critical Infrastructure Profile — On April 7, 2026, NIST released a concept note for an AI RMF Profile on Trustworthy AI in Critical Infrastructure, which will guide critical infrastructure operators towards specific risk management practices. Available at nist.gov

Key Highlights