Last Week in AI Security — Week of March 30, 2026
Anthropic's Claude Code codebase accidentally exposed on npm in packaging error; China-linked attackers exploit Claude for cyberattacks; Unit 42 fuzzing research reveals LLM guardrail fragility at scale.
Key Highlights
- Anthropic ships entire Claude Code source (500K lines) to npm in misconfigured debug bundle
- China-linked hackers exploit Claude and DeepSeek in Mexican government attack stealing tax data
- Unit 42 genetic prompt fuzzing reveals LLMs remain vulnerable despite years of safety work
- Anthropic's leaked 'Mythos' model warns of unprecedented cybersecurity exploitation capabilities
- Fortinet patches CVE-2026-35616 (CVSS 9.1) actively exploited in FortiClient EMS
Executive Summary
The week of March 30, 2026 brought three converging developments that define the current AI security threat landscape: supply-chain exposure through accidental code publication, real-world exploitation of AI models by nation-state actors, and mounting evidence that fundamental guardrail vulnerabilities remain unsolved despite years of defensive investment. These stories illuminate how rapidly the attack surface is expanding as AI systems move from experimental to operational.
On March 31, Anthropic accidentally published the entire source code of Claude Code—nearly 2,000 files and 500,000 lines—to the public npm registry due to a single misconfigured debug file bundled into a routine update. A researcher tweeted a direct download link, and within hours 16 million people had descended on the thread, with the fastest GitHub mirror hitting 50,000 stars in under two hours. The incident coincided with a separate supply-chain attack on the axios npm package, meaning installations during a three-hour window on March 31 may have pulled in a Remote Access Trojan alongside legitimate Anthropic code. Anthropic stated no customer data or credentials were exposed, attributing the leak to human error in packaging rather than a security breach, but the timing with the axios compromise created a compounded risk window.
Meanwhile, real-world exploitation reached new levels of sophistication. A hacker used Anthropic’s Claude model alongside Chinese-made DeepSeek to attack Mexican government agencies, with the attacker asking Claude in Russian to create a web panel for managing hundreds of targets, according to chat logs shared by Gambit Security. In February, another hacker used Claude in attacks against Mexican government agencies, stealing sensitive tax and voter information, Bloomberg reported. These incidents demonstrate that frontier AI models are now operational tools in nation-state cyber operations, not theoretical risks.
Research from Palo Alto Networks’ Unit 42 underscores why these attacks succeed. Unit 42 researchers developed a genetic algorithm-inspired prompt fuzzing method to automatically generate variants of disallowed requests that preserved their original meaning and measured guardrail fragility under systematic rephrasing, uncovering weaknesses with evasion rates ranging from low single digits to high levels in specific keyword and model combinations. Despite years of investment in guardrails, prompt jailbreaking and prompt injection remain one of the most well-known and actively discussed attack classes, with OWASP listing prompt injection as the top risk category for LLM applications in 2025. The convergence of these three stories—accidental exposure, active exploitation, and persistent vulnerability—signals that AI security has entered an operational crisis phase requiring immediate architectural response rather than incremental mitigation.
Top Stories
Anthropic Ships 500,000 Lines of Claude Code Source to Public npm Registry
On March 31, 2026, Anthropic shipped the entire source code of Claude Code to the public npm registry—not a snippet or teaser, but the full package: nearly 2,000 files and 500,000 lines of code, exposed to the world because of a single misconfigured debug file that got bundled into a routine update. The discovery went viral almost instantly. A researcher tweeted a direct download link, and within hours 16 million people had descended on the thread, with the fastest GitHub mirror hitting 50,000 stars in under two hours.
Anthropic stated that no customer data or credentials were exposed and that this was a packaging error caused by human error, not a security breach; however, a separate supply-chain attack hit the axios npm package just hours before the leak, meaning anyone who installed or updated Claude Code via npm on March 31 between 00:21 and 03:29 UTC may have pulled in a malicious version containing a Remote Access Trojan. The timing created a compounded risk window where developers could have simultaneously pulled compromised dependencies and exposed source code.
The operational security implications extend beyond a single incident. This incident occurred just days after the accidental reveal of “Claude Mythos,” Anthropic’s next-generation model, marking a concerning pattern for the safety-first AI lab’s own operational security. For security practitioners, the incident demonstrates that even organizations with strong security cultures face systemic challenges when handling complex build pipelines, package managers, and CI/CD workflows at scale. The lesson is clear: software supply-chain security for AI tooling requires the same rigor as production model deployment—including pre-publication scanning, automated secret detection, and separation of debug and release build artifacts.
Nation-State Actors Exploit Claude and DeepSeek in Government Cyberattacks
Real-world exploitation of frontier AI models by sophisticated threat actors crossed a new threshold this week with documented evidence of Claude being used in government-targeted attacks. A hacker used Anthropic’s Claude model as well as Chinese-made DeepSeek in attacks, with the hacker asking Claude in Russian to create a web panel for managing hundreds of targets; AI gives hackers of varying skill “superpowers” by simplifying the technical knowledge required to exploit systems, according to Eyal Sela, director of threat intelligence at Gambit Security.
In February, a hacker used Claude in a series of attacks against Mexican government agencies, stealing sensitive tax and voter information, Bloomberg reported. These incidents mark a shift from theoretical capability demonstrations to operational use of LLMs in targeted intrusions with confirmed data exfiltration. In one documented case, Anthropic discovered that a Chinese state-sponsored group had already been running a coordinated campaign using Claude Code to infiltrate roughly 30 organizations—including tech companies, financial institutions, and government agencies—before the company detected it; over the following 10 days, Anthropic investigated the full scope of the operation, banned the accounts involved, and notified affected organizations.
The technical progression is clear: attackers are using LLMs not just for initial reconnaissance or payload generation, but for orchestrating multi-target campaigns with persistent infrastructure management. China and other US adversaries are “hunting for any edge to improve the performance of their homegrown AI,” potentially mining any leaks of US AI models to try to “supercharge their own cyber weapons systems,” according to Joe Lin, co-founder and CEO at Twenty. Security teams should assume that any frontier model accessible via API or leaked source code will be integrated into adversary toolchains within weeks. Defense strategies must account for AI-assisted reconnaissance, payload generation, and campaign management as baseline capabilities available to moderately sophisticated threat actors.
Unit 42 Prompt Fuzzing Research Exposes Persistent Guardrail Fragility
Unit 42 researchers developed a genetic algorithm-inspired prompt fuzzing method to automatically generate variants of disallowed requests that preserved their original meaning and measured guardrail fragility under systematic rephrasing, uncovering guardrail weaknesses with evasion rates ranging from low single digits to high levels in specific keyword and model combinations. The research demonstrates that the fundamental challenge with LLM safety is not isolated failure cases but systematic vulnerability to automated variation.
The key difference from prior single-prompt jailbreak examples is scalability: small failure rates become reliable when attackers can automate at volume. This shifts the threat model from manual red-teaming exercises to industrial-scale bypass testing where adversaries can iterate through thousands of semantic variations until finding successful attack vectors. Despite years of investment in defenses, prompt jailbreaking and prompt injection remain one of the most well-known and actively discussed attack classes against LLM applications, with OWASP listing prompt injection as the top risk category for LLM applications in 2025.
The U.K. National Cyber Security Centre has argued that prompt injection differs materially from SQL injection and may be harder to fix in a definitive way because LLMs do not enforce a clean separation between instructions and data within prompts. This architectural reality means that prompt injection may be a fundamental property of current LLM designs rather than a bug to be patched. Security teams deploying LLM-powered applications should implement defense-in-depth strategies including input validation, output sanitization, privilege separation, and continuous monitoring rather than relying solely on model-level guardrails. The Unit 42 research provides a sobering reminder that safety investments improve but do not eliminate these attack vectors.
Framework & Standards Updates
NIST published a preliminary draft of the Cybersecurity Framework Profile for Cyber AI, with a public comment period open through January 30, 2026; per the AI Action Plan, the AI RMF is currently in revision and will be included in a future version. The Cyber AI Profile addresses three focus areas: securing AI system components, implementing AI-enabled cybersecurity, and thwarting AI-enabled cyber attacks. Organizations seeking to align AI security programs with federal guidance should review the preliminary draft and submit comments before the deadline.
Key frameworks include the OWASP Top 10 for LLM Applications 2025 and the OWASP Top 10 for Agentic Applications 2026 for risk taxonomy, the NIST AI RMF with its GenAI Profile (AI 600-1) providing 200+ suggested risk management actions, MITRE ATLAS documenting adversary tactics and techniques specific to AI systems, ISO/IEC 42001 as the first certifiable international AI management system standard, and the EU AI Act establishing regulatory requirements with penalties up to EUR 35 million. Organizations should map their GenAI deployments to applicable frameworks based on geography, industry, and risk profile.
In ISACA’s 2026 Tech Trends and Priorities survey, only 13% of professionals say that they are well-prepared to face GenAI risks despite how 62% of respondents identify AI and machine learning as top technological priorities for 2026. The gap between adoption priority and preparedness underscores the need for structured framework adoption to guide security implementation.
Vulnerability Watch
CVE-2026-35616: Fortinet FortiClient EMS Pre-Auth API Bypass (CVSS 9.1)
Fortinet released out-of-band patches for CVE-2026-35616 (CVSS score: 9.1), a critical security flaw impacting FortiClient EMS that has been exploited in the wild; the vulnerability is described as a pre-authentication API access bypass leading to privilege escalation. An improper access control vulnerability [CWE-284] in FortiClient EMS may allow an attacker to escalate privileges. Organizations using FortiClient EMS should apply patches immediately and review authentication logs for signs of unauthorized API access during the vulnerable period.
CVE-2025-32434: PyTorch torch.load() RCE Despite weights_only=True (CVSS 9.3)
A researcher discovered CVE-2025-32434 in PyTorch, a Remote Code Execution (RCE) vulnerability rated 9.3 CVSS and categorized as critical; exploitation under certain conditions allows an attacker to run arbitrary code when a malicious AI model is being loaded on the victim’s computer. The vulnerability affects PyTorch versions 2.5.1 and prior, specifically in the torch.load() function when used with weights_only=True parameter; this parameter was previously considered a security safeguard, but the researcher proved that it could still be exploited to achieve remote code execution.
The PyTorch development team released update 2.6.0, in which CVE-2025-32434 was successfully fixed; all previous versions—up to 2.5.1—remain vulnerable and should be updated as soon as possible. Organizations loading models from untrusted sources or public repositories should upgrade to PyTorch 2.6.0 or later immediately. Users can update using pip (pip install torch torchvision torchaudio —upgrade) or conda (conda update pytorch torchvision torchaudio -c pytorch).
Malicious npm Packages Exploit Redis and PostgreSQL for Persistent Implants
Researchers discovered 36 malicious packages in the npm registry disguised as Strapi CMS plugins but containing different payloads to facilitate Redis and PostgreSQL exploitation, deploy reverse shells, harvest credentials, and drop a persistent implant; every package contains three files (package.json, index.js, postinstall.js), has no description or repository. The supply-chain attack demonstrates continuing npm registry abuse for infrastructure compromise. Organizations should implement software composition analysis (SCA) tools, verify package provenance before installation, and monitor for unexpected network connections from development environments.
Industry Radar
OpenAI announced plans to acquire agentic AI security testing firm Promptfoo on March 9, providing OpenAI with Promptfoo’s expertise in identifying and remediating security vulnerabilities in AI systems during development. The acquisition signals OpenAI’s recognition that security testing capabilities must be first-party competencies for frontier model developers rather than third-party services.
On March 11, Google announced that it had completed its acquisition of Wiz, the cloud and AI security platform, after the deal was initially publicized in March 2025; this acquisition by Google Cloud, valued at $32bn, is designed to help it improve cloud security and enable organizations to build quickly and securely across any cloud or AI platform. The Wiz acquisition represents the largest cloud security deal in history and positions Google Cloud to offer integrated security across AI and traditional cloud workloads.
Mercor, an AI recruiting startup, reported a cyberattack tied to an open-source AI tool compromise; the breach highlighted how vulnerabilities in shared AI infrastructure can ripple across companies. Mercor confirmed a cyberattack linked to the compromise of the open-source LiteLLM project, where hackers stole data and issued extortion demands, highlighting the dangers of supply chain attacks in open-source ecosystems. Organizations relying on open-source AI infrastructure should implement dependency pinning, automated vulnerability scanning, and incident response procedures for third-party compromise scenarios.
Policy Corner
The latest generation of frontier models from both Anthropic and OpenAI have crossed a threshold that the companies say poses new cybersecurity risks; in February, when OpenAI released GPT-5.3-Codex, the company said it was the first model it had classified as “high capability” for cybersecurity-related tasks under its Preparedness Framework—and the first it had directly trained to identify software vulnerabilities; Anthropic navigated similar risks with its Opus 4.6, released the same week.
In a leaked blog post, Anthropic warned that its upcoming AI model, called Mythos, and others like it can exploit vulnerabilities at an unprecedented pace; OpenAI warned in December that its upcoming models posed a “high” cybersecurity risk. Anthropic is privately warning government officials about the potential for large-scale cyberattacks enabled by Mythos; every lab’s next model will pose increasingly severe cybersecurity threats, with models behind Mythos including the next OpenAI model, the next Google Gemini, and Chinese models following a few months behind.
No new legislative actions were reported this week. The regulatory focus remains on voluntary industry coordination around model capability disclosure and red-team testing ahead of public release.
Research Spotlight
Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review
Published January 7, 2026, this comprehensive review synthesizes research from 2023 to 2025, analyzing 45 key sources, industry security reports, and documented real-world exploits; the paper examines the taxonomy of prompt injection techniques, including direct jailbreaking and indirect injection through external content, with the rise of AI agent systems and the Model Context Protocol (MCP) dramatically expanding attack surfaces. The review documents critical incidents including GitHub Copilot’s CVE-2025-53773 remote code execution vulnerability (CVSS 9.6) and ChatGPT’s Windows license key exposure. The paper provides a structured taxonomy for practitioners seeking to understand the full scope of prompt injection risks across both conversational and agentic AI deployments.
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs
Published May 7, 2025, the experiments evaluated over 1,400 adversarial prompts across four LLMs: GPT-4, Claude 2, Mistral 7B, and Vicuna, analyzing results along several dimensions including model susceptibility, attack technique efficacy, prompt behavior patterns, and cross-model generalization. Among the tested models, GPT-4 demonstrated the highest vulnerability with an ASR of 87.2%, confirming its powerful but permissive instruction-following nature; while Claude 2 performed slightly better in filtering, it still succumbed to 82.5% of attacks. Prompt injections exploiting roleplay dynamics (e.g., impersonation of fictional characters or hypothetical scenarios) achieved the highest ASR (89.6%), with these prompts often bypassing filters by deflecting responsibility away from the model. The systematic evaluation provides baseline measurements for comparing guardrail effectiveness across model families and attack categories.
Adversarial Machine Learning: A Taxonomy and Terminology
NIST’s updated taxonomy document provides structured definitions for adversarial ML concepts relevant to AI security practitioners. AML literature predominantly considers adversarial attacks against AI systems that could occur at either the training stage or the deployment stage; during the training stage, the attacker might control part of the training data, their labels, the model parameters, or the code of ML algorithms, resulting in different types of poisoning attacks; during the deployment stage, the ML model is already trained, and the adversary could mount evasion attacks to create integrity violations and change the ML model’s predictions, as well as privacy attacks to infer sensitive information about the training data or the ML model. The document serves as a reference for organizations developing threat models and security architectures for ML systems.
What This Means For You
This week’s convergence of supply-chain exposure, nation-state exploitation, and persistent guardrail failures demands immediate action across three dimensions: operational security for AI development pipelines, threat modeling for adversarial AI use, and architectural defenses that assume model-level protections will fail.
First, treat AI development tooling as critical infrastructure. Anthropic’s accidental exposure of 500,000 lines of source code through a misconfigured debug file in a routine npm update demonstrates that even safety-focused organizations face systemic challenges in build pipeline security. Implement automated scanning for secrets and debug artifacts before package publication, enforce strict separation between development and release build configurations, and require multi-party approval for production releases. If your organization publishes AI-related packages to public registries, audit your CI/CD workflows this week for similar exposure risks.
Second, update threat models to account for adversarial AI use at scale. Documented evidence shows hackers using Claude and DeepSeek to attack Mexican government agencies, with AI “giving hackers of varying skill ‘superpowers’ by simplifying the technical knowledge required to exploit systems”. This is no longer theoretical: Chinese state-sponsored groups have used Claude Code to infiltrate roughly 30 organizations including tech companies, financial institutions, and government agencies. Security operations teams should assume that reconnaissance, vulnerability research, and payload generation are now AI-assisted for moderately sophisticated threat actors. Adjust detection strategies to identify rapid iteration patterns, automated scanning at unusual scale, and novel attack variants that suggest AI-assisted generation.
Third, implement defense-in-depth for LLM-powered applications because guardrails remain fundamentally fragile. Unit 42’s genetic fuzzing research uncovered guardrail weaknesses with evasion rates ranging from low single digits to high levels in specific keyword and model combinations, proving that small failure rates become reliable when attackers can automate at volume. Do not rely solely on model-level safety mechanisms. Every LLM-powered application with meaningful privilege should implement: input validation that blocks or sanitizes high-risk patterns before they reach the model; output validation that prevents execution of potentially malicious content generated by the model; privilege separation that limits the blast radius when prompt injection succeeds; and continuous monitoring for anomalous behavior patterns indicating successful bypasses. The U.K. National Cyber Security Centre argues that prompt injection may be harder to fix in a definitive way because LLMs do not enforce a clean separation between instructions and data within prompts, making architectural controls essential rather than optional.
Organizations deploying agentic AI systems face compounded risk. The rise of AI agent systems and the Model Context Protocol (MCP) has dramatically expanded attack surfaces, introducing vulnerabilities such as tool poisoning and credential theft. Before deploying agents with tool access or elevated privileges, implement human-in-the-loop approval for high-consequence actions, cryptographic verification of tool invocations, and comprehensive audit logging of agent decisions. The operational security, threat modeling, and defense-in-depth priorities outlined above should be considered baseline requirements for any production AI deployment in 2026.
Tools and Resources
No new open-source tools or frameworks were announced this week. Security teams seeking to implement prompt injection defenses should review the OWASP Top 10 for LLM Applications 2025 and the OWASP Top 10 for Agentic Applications 2026 for current mitigation guidance. Organizations building internal red-teaming capabilities may benefit from reviewing the genetic fuzzing methodology described in Unit 42’s research, though no public implementation was released.
For vulnerability scanning of ML frameworks, existing tools like Protect AI’s Huntr platform and static analysis tools should be prioritized for immediate deployment. Organizations using PyTorch should prioritize upgrading to version 2.6.0 to remediate CVE-2025-32434 before scanning for additional supply-chain risks.