Last Week in AI Security — Week of February 2, 2026

Executive Summary

Week 6 of 2026 delivered sobering evidence that AI security risks are materializing faster than defenses. The International AI Safety Report 2026, published by the UK’s AI Security Institute with support from over 30 nations, documented that general-purpose AI systems are “already causing real-world harm” with deepfakes projected to feature in 20% of fraud attempts by year-end and AI-assisted cyberattacks showing a 91% year-over-year surge. The report’s most alarming finding: AI agents now identify 77% of vulnerabilities in real software—placing in the top 5% of capture-the-flag teams—and developers have identified attackers using their systems to generate code for cyberattacks.

Meanwhile, a Firebase misconfiguration exposed 300 million messages from over 25 million users of Chat & Ask AI, one of the most popular AI chat apps with more than 50 million users. The breach, discovered by independent security researcher Harry, highlights a systemic problem: 103 out of 200 iOS apps scanned had Firebase Security Rules set to public, collectively exposing tens of millions of stored files.

On the framework front, NIST released a preliminary draft of its Cybersecurity Framework Profile for Artificial Intelligence, extending CSF 2.0 to address AI-specific risks across three focus areas: securing AI systems, conducting AI-enabled defense, and thwarting adversarial cyberattacks using AI. Comments are open until January 30, 2026.

Framework & Standards Updates

NIST Releases Cyber AI Profile Preliminary Draft

The U.S. Department of Commerce’s National Institute of Standards and Technology has released an initial preliminary draft of the Cybersecurity Framework Profile for Artificial Intelligence (Cyber AI Profile or NIST IR 8596). The preliminary draft is designed as a voluntary framework that would extend the recently updated NIST Cybersecurity Framework 2.0 to new cybersecurity risks and opportunities introduced by AI.

The Cyber AI Profile is organized around three focus areas:

Secure: Securing AI system components
Defend: Conducting AI-enabled cyber defense
Thwart: Thwarting adversarial cyberattacks using AI

The preliminary draft proposes to integrate AI-specific considerations across all six core functions of the NIST CSF 2.0. Sample considerations are provided for each of the three focus areas. The considerations are assigned a proposed priority level - “1” for High Priority, “2” for Moderate Priority, and “3” for Foundational Priority.

Public comment period: Comments are open until January 30, 2026. Organizations should engage now to shape the final framework. Review the preliminary draft and submit feedback through the NIST NCCoE Cyber AI Profile project page.

OWASP Top 10 for Agentic Applications 2026

The Top 10 for Agentic Applications, released in December 2025, lists the highest-impact threats to autonomous AI agentic applications, systems that plan, decide, and act across tools and steps. It distills the top threats in a practical manner, building directly on prior OWASP work while highlighting agent-specific amplifiers, such as delegation and multi-step execution.

The new list pivots from passive LLM risks to active agent behaviors. The list pivots from passive LLM risks to active agent behaviors. Agents are treated as principals with goals, tools, memory, and inter-agent protocols as distinct attack surfaces.

Key risks include:

Goal Hijacking: Attackers manipulate the agent’s decision pathways or objectives through indirect means like documents or external data sources
Tool Misuse: Unsafe use of legitimate tools due to ambiguous instructions or over-privileged access
System Prompt Leakage: Exposure of operational parameters that attackers can exploit
Agentic Tool Extraction (ATE): Multi-turn reconnaissance attacks to extract complete tool schemas

Practical DevSecOps provides a comprehensive breakdown of how to secure agentic systems against these threats.

MITRE ATLAS Expands to Cover AI Agents

MITRE ATLAS maps 14 tactics and 66 techniques to defend AI systems from threats like data poisoning and model theft. ATLAS added 14 new techniques in 2025 for AI agents, covering risks like prompt injection and memory manipulation attacks.

The expansion addresses the security gap created by agentic AI systems. Organizations should incorporate ATLAS tactics into threat modeling exercises and map OWASP Top 10 for Agentic Applications risks to ATLAS techniques for comprehensive coverage.

Vulnerability Watch

CVE-2025-32434: Critical PyTorch Remote Code Execution (CVSS 9.3)

A vulnerability in PyTorch – an open-source machine-learning framework – registered as CVE-2025-32434, belongs to the Remote Code Execution (RCE) class, and has a 9.3 CVSS rating. Exploitation of CVE-2025-32434 under certain conditions allows an attacker to run arbitrary code when a malicious AI model is being loaded on the victim’s computer.

The vulnerability undermines PyTorch’s weights_only=True parameter, which was previously trusted to prevent unsafe deserialization. This flaw, discovered by security researcher Ji’an Zhou, undermines the safety of the torch.load() function even when configured with weights_only=True—A parameter long trusted to prevent unsafe deserialization.

Affected versions: PyTorch ≤2.5.1
Fixed in: PyTorch 2.6.0
Mitigation: The team responsible for developing the PyTorch framework released its update 2.6.0, in which the vulnerability CVE-2025-32434 was successfully fixed. All previous versions – up to 2.5.1 – remain vulnerable and should be updated as soon as possible. If this isn’t possible for some reason, the researchers recommend refraining from using the torch.load() function with the weights_only=True parameter.

The vulnerability highlights AI supply chain risks. Malicious models distributed via public repositories (e.g., Hugging Face Hub) could exploit this vulnerability at scale.

CVE-2026-22584: Salesforce uni2TS Remote Code Execution (High Severity)

Salesforce has acknowledged this issue, released a CVE record CVE-2026-22584, rated High severity, and issued a fix on July 31. This fix implements an allow list and a strict validation check to ensure only explicitly permitted modules can be executed.

The vulnerability affects uni2TS, a PyTorch library used by Salesforce’s Morai foundation model for time series analysis. Exploitation allows remote code execution through malicious metadata in model files.

Mitigation: Update to the latest version of uni2TS with the July 31, 2025 patch applied.

CVE-2025-23304: NVIDIA NeMo Remote Code Execution (High Severity)

NVIDIA acknowledged this issue, released a CVE record CVE-2025-23304 rated High severity and issued a fix in NeMo version 2.3.2. The vulnerability allows arbitrary code execution through malicious .nemo files containing embedded .pickle files.

Affected versions: NeMo versions ≤2.3.1
Fixed in: NeMo 2.3.2
Note: As of January 2026, over 700 models on HuggingFace from a variety of developers are provided in NeMo format. Many of these models are among the most popular on HuggingFace, such as NVIDIA’s parakeet.

CVE-2025-10155, CVE-2025-10156, CVE-2025-10157: PickleScan Bypass Vulnerabilities

Three critical vulnerabilities in PickleScan—the de facto security scanner for PyTorch models—allow attackers to bypass malware detection:

CVE-2025-10155 (CVSS 9.3/7.8): File extension bypass
CVE-2025-10156 (CVSS 9.3/7.5): ZIP archive CRC error bypass
CVE-2025-10157 (CVSS 9.3/8.3): Unsafe globals check bypass

Successful exploitation of the aforementioned flaws could allow attackers to conceal malicious pickle payloads within files using common PyTorch extensions, deliberately introduce CRC errors into ZIP archives containing malicious models, or craft malicious PyTorch models with embedded pickle payloads to bypass the scanner.

Implication: Organizations relying solely on PickleScan for model security are at risk. Implement defense-in-depth strategies including sandboxed model loading environments and behavior monitoring.

Attack Research

Prompt Injection Research Advances

Our experiments evaluated over 1,400 adversarial prompts across four LLMs: GPT-4, Claude 2, Mistral 7B, and Vicuna. We analyze results along several dimensions, including model susceptibility, attack technique efficacy, prompt behavior patterns, and cross-model generalization.

Key findings from new research published in May 2025:

Prompt injections exploiting roleplay dynamics achieved the highest ASR (89.6%). Logic trap attacks achieved 81.4% ASR by exploiting conditional structures and moral dilemmas. Encoding tricks achieved 76.2% ASR by evading keyword-based filtering mechanisms
Among the tested models, GPT-4 demonstrated the highest vulnerability with an ASR of 87.2%, confirming its powerful but permissive instruction-following nature

Simon Willison’s November 2025 analysis of two new prompt injection papers reinforces the grim reality: By systematically tuning and scaling general optimization techniques, attackers bypass 12 recent defenses with attack success rate above 90% for most. The “Human red-teaming setting” scored 100%, defeating all defenses.

The research paper “The Attacker Moves Second” demonstrates that static example attacks—single string prompts designed to bypass systems—are an almost useless way to evaluate these defenses. Adaptive attacks that iterate multiple times defeat nearly all published defenses.

Adversarial Machine Learning Threat Landscape

Adversarial Machine Learning will target defensive models by injecting poisoned data to create blind spots in anomaly detection. Gartner predicts that by 2026, 30% of enterprises will face AI-specific attacks, up from single digits today. Defenders will mature AI from augmentation to orchestration.

A comprehensive meta-survey published in 2025 documents the expansion of adversarial vulnerabilities beyond computer vision into graph neural networks, natural language processing, federated learning, and text-to-image models. Even though adversarial vulnerabilities were first explored in computer vision, analogous threats have expanded to domains like graph neural networks, natural language processing, federated learning, and text-to-image models. Despite varied attack surfaces, commonalities can be found.

Industry Radar

OpenAI Launches Frontier Enterprise AI Platform

AI giant OpenAI announced the launch of OpenAI Frontier, an end-to-end platform designed for enterprises to build and manage AI agents. It’s an open platform, which means users can manage agents built outside of OpenAI too.

Announced February 5, 2026, Frontier represents OpenAI’s pivot toward enterprise infrastructure. OpenAI said Frontier was designed to work the same way companies manage human employees. Frontier offers an onboarding process for agents and a feedback loop that is meant to help them improve over time the same way a review might help an employee.

Early adopters include HP, Intuit, Oracle, State Farm, Thermo Fisher, and Uber.

Anthropic Releases Claude Opus 4.6

Anthropic released Claude Opus 4.6 in February 2026. The release comes as Anthropic intensifies its enterprise push, including partnerships with defense and intelligence agencies. In November 2025, Anthropic said that hackers sponsored by the Chinese government used Claude to perform automated cyberattacks against around 30 global organisations. The hackers tricked Claude into carrying out automated subtasks by pretending it was for defensive testing.

This incident highlights the dual-use nature of advanced AI systems and the challenge of preventing adversarial use even with safety measures in place.

NVIDIA Debuts Nemotron 3 Open Model Family

NVIDIA announced the NVIDIA Nemotron™ 3 family of open models, data and libraries designed to power transparent, efficient and specialized agentic AI development across industries. The Nemotron 3 models — with Nano, Super and Ultra sizes — introduce a breakthrough hybrid latent mixture-of-experts (MoE) architecture.

Nemotron 3 Nano is available today; Super and Ultra variants are expected in the first half of 2026.

Policy Corner

EU AI Act Implementation Accelerates

The AIBOM (AI Bill of Materials) has become the mandatory standard under the 2026 EU AI Act and new NIST 800-53 overlays. Organizations operating in or serving European markets must prepare for binding obligations including transparency requirements, risk classifications, and documentation standards.

The EU AI Act’s General-Purpose AI Code of Practice is under development, with initial drafts expected in Q1 2026. Organizations should monitor EU AI Act updates and begin gap assessments against anticipated requirements.

Colorado AI Act Implementation Delayed

The CAIA requires risk management for AI-driven decisions in employment, housing, and healthcare and will be implemented as of June 30, 2026 (delayed from February 1, 2026). The delay provides additional time for organizations to implement required controls but does not change the fundamental obligations.

California’s multiple AI transparency and sectoral laws remain on track for 2026 implementation, creating a complex state-level compliance landscape.

US Federal AI Governance in Flux

The White House’s July 2025 AI Action Plan and a December 2025 executive order promote a minimally burdensome national framework and discourages state-level AI mandates. This contrast to the emergence of state AI regulation creates legal uncertainty.

Organizations face the challenge of complying with state AI law obligations while accounting for potential federal preemption efforts. Flexible AI governance frameworks that can adapt to changing regulatory requirements are essential.

Research Spotlight

Comprehensive Review of Prompt Injection and Jailbreak Vulnerabilities

A comprehensive review published in January 2026 synthesizes research from 2023 to 2025, analyzing 45 key sources on prompt injection attacks. This comprehensive review synthesizes research from 2023 to 2025, analyzing 45 key sources, industry security reports, and documented real-world exploits. We examine the taxonomy of prompt injection techniques, including direct jailbreaking and indirect injection through external content.

Key contributions include documentation of critical incidents:

GitHub Copilot’s CVE-2025-53773 RCE and the CamoLeak CVSS 9.6 exploit
Taxonomy spanning simple jailbreaking to sophisticated multi-stage exploits
Critical evaluation of defense mechanisms, identifying why many fail against determined attackers

NIST Updates Adversarial Machine Learning Taxonomy

NIST published an updated edition of “Adversarial Machine Learning: A Taxonomy and Terminology” (NIST.AI.100-2e2025) in 2025. AML literature predominantly considers adversarial attacks against AI systems that could occur at either the training stage or the deployment stage. During the training stage, the attacker might control part of the training data, their labels, the model parameters, or the code of ML algorithms, resulting in different types of poisoning attacks. During the deployment stage, the ML model is already trained, and the adversary could mount evasion attacks to create integrity violations and change the ML model’s predictions, as well as privacy attacks.

The updated taxonomy provides essential vocabulary for threat modeling and risk assessments. Download the full document from NIST publications.

What This Means For You

The convergence of research findings, real-world incidents, and framework updates this week paints a clear picture: AI security is transitioning from theoretical risk to operational reality. Three immediate actions matter most:

1. Audit Firebase and Similar Backend Configurations Now
The Chat & Ask AI breach affecting 25 million users demonstrates that basic security hygiene failures have catastrophic consequences at AI scale. If you’re building AI applications with Firebase, Supabase, or similar backend-as-a-service platforms, verify your security rules today. 103 out of 200 iOS apps had Firebase Security Rules set to public, collectively exposing tens of millions of stored files. Don’t assume your team got it right—verify.

2. Treat All AI Model Files as Untrusted Executables
The PyTorch CVE-2025-32434 vulnerability invalidates a core security assumption: that model weights can be safely loaded in isolation. Organizations downloading models from Hugging Face, GitHub, or other repositories must implement defense-in-depth strategies including sandboxed loading environments, behavior monitoring, and supply chain verification. The PickleScan bypass vulnerabilities prove that relying on a single security tool is insufficient.

Update your AI supply chain security checklist:

Sandbox all model loading operations
Implement runtime behavior monitoring for unexpected network calls or file system access
Document model provenance and verify signatures where available
Consider AIBOM requirements for models used in production systems

3. Accept That Prompt Injection Defenses Are Immature—Design Accordingly
The research is unequivocal: adaptive attacks defeat 90-100% of published prompt injection defenses. The Google Gemini calendar data theft and the documented 89.6% attack success rate for roleplay-based injections mean you cannot rely on prompt engineering or filtering alone to secure AI agents.

Follow Meta’s “Agents Rule of Two” principle: limit agent permissions so that even fully compromised agents cannot cause catastrophic damage. Implement human-in-the-loop controls for high-consequence actions. Use the OWASP Top 10 for Agentic Applications as your threat model and map controls to MITRE ATLAS techniques.

The international community’s message through the AI Safety Report is clear: General-purpose AI systems are already causing real-world harm. Advances in AI capabilities may pose further risks that have not yet materialized. The gap between AI capability advancement and security maturity is widening, not closing. Organizations that treat AI security as an afterthought will pay for it in 2026.

Tools and Resources

NIST Cyber AI Profile – Preliminary draft extending CSF 2.0 to AI systems. Public comment period open until January 30, 2026.

OWASP Top 10 for Agentic Applications 2026 – Essential threat model for autonomous AI systems. Addresses goal hijacking, tool misuse, and agentic tool extraction.

MITRE ATLAS – Adversary tactics and techniques for AI systems. Updated with 14 new techniques for AI agents in 2025.

PickleScan – Security scanner for PyTorch models. Note: Three bypass vulnerabilities disclosed; use as part of defense-in-depth, not sole control.

Firebase Security Rules Auditor – Check if your Firebase configuration is publicly exposed. Researcher Harry’s tool scans app store apps for this vulnerability.

International AI Safety Report 2026 – Comprehensive scientific assessment of AI capabilities and risks from 30+ nations. Essential reading for AI governance.

Adversa AI Monthly Security Digest – Curated collection of GenAI security incidents, research, and defenses. February 2026 edition covers prompt injection, data poisoning, and framework updates.