Last Week in AI Security — Week of February 16, 2026
International AI Safety Report 2026 published; AI-assisted threat actor compromised 600+ FortiGate devices; Google Translate Gemini prompt injection discovered.
Key Highlights
- AI-powered attack compromised 600+ FortiGate firewalls across 55 countries
- International AI Safety Report 2026 released by 100+ experts from 30+ nations
- Google Translate Gemini mode exploited via prompt injection vulnerability
- Anthropic released Claude Code Security tool for automated vulnerability scanning
- NIST published preliminary draft of Cyber AI Profile (NIST IR 8596 iprd)
Executive Summary
This week brought stark evidence of AI’s dual role in cybersecurity: as both weapon and defense. A Russian-speaking threat actor used multiple commercial generative AI services to compromise over 600 FortiGate devices across 55 countries between January 11 and February 18, 2026, marking one of the first documented large-scale cyberattacks significantly augmented by AI capabilities. Meanwhile, the second International AI Safety Report was published in February 2026, representing the largest global collaboration on AI safety to date with contributions from over 100 experts nominated by more than 30 countries and international organizations.
The security community also grappled with fundamental vulnerabilities in AI systems themselves. Google Translate’s Gemini integration was exposed to prompt injection attacks that bypass translation to generate dangerous content through simple text commands, with the vulnerability remaining active in production as of February 10. On the defensive side, frameworks and tools matured rapidly: NIST’s Cyber AI Profile (NIST Community Profile) became available for public comment through January 30th, while Anthropic released Claude Code Security in limited research preview, which scans codebases for security vulnerabilities and suggests targeted software patches for human review.
These developments underscore a critical inflection point: AI systems are simultaneously becoming essential infrastructure and prime targets, requiring security practitioners to adopt AI-specific defenses while defending against AI-enabled attacks.
Top Stories
AI-Assisted Threat Actor Compromises 600+ FortiGate Devices in 55 Countries
A financially motivated, Russian-speaking threat actor leveraged multiple commercial generative AI services to compromise over 600 FortiGate devices located in 55 countries, with the activity observed between January 11 and February 18, 2026, according to findings from Amazon Threat Intelligence. This represents one of the first confirmed cases of commercial AI services being weaponized at scale in offensive cyber operations.
The attack was confirmed on February 21, marking a landmark case of AI-enabled offensive operations targeting enterprise network infrastructure at scale. While specific technical details about how the AI services were utilized remain limited in public reporting, the incident demonstrates the practical application of generative AI in reducing barriers to entry for sophisticated network intrusions.
The campaign’s significance extends beyond the raw number of compromised devices. It validates long-standing warnings from the security community about the democratization of attack capabilities through AI. The fact that a financially motivated actor—not necessarily a nation-state with advanced capabilities—could orchestrate operations across 55 countries suggests AI is already altering the cost-benefit calculus of offensive cyber operations.
International AI Safety Report 2026: Global Assessment of AI Risks
The second International AI Safety Report was published on February 3, 2026, representing the largest global collaboration on AI safety to date, led by Turing Award winner Yoshua Bengio and backed by an Expert Advisory Panel with nominees from more than 30 countries and international organizations, and authored by over 100 AI experts. The report provides a comprehensive, science-based assessment of general-purpose AI capabilities and risks without making specific policy recommendations, instead synthesizing scientific evidence to inform decision-makers.
The report found increasing and emerging concerns around the use of artificial intelligence in deepfakes, biological weapons, and cyberattacks. On the cybersecurity front, general-purpose AI can help enable cyberattacks by identifying software vulnerabilities and writing and executing code to exploit them, with criminal groups and state-associated attackers actively using GPAI in their operations, though AI currently plays its largest role in scaling the preparatory stages of an attack—AI systems are not yet executing cyberattacks fully autonomously.
The report states that “general-purpose AI systems are already causing real-world harm”, while a key message is that while AI risk management practices are becoming more structured, real-world evidence of their effectiveness remains limited, with the report highlighting a growing mismatch between the speed of AI capability advances and the pace of governance. This gap between capability and control represents one of the report’s central concerns for security practitioners.
Google Translate Gemini Mode Vulnerable to Prompt Injection
Google Translate’s Gemini integration was exposed to prompt injection attacks that bypass translation to generate dangerous content through simple text commands, discovered this week through a method where users can enter a question in a foreign language with an English meta-instruction below it to make the system answer questions instead of translating. LLM jailbreaker ‘Pliny the Liberator’ demonstrated the exploit can produce dangerous content including instructions for making drugs and malware.
As of February 10, the vulnerability remains active in Google Translate’s production system, affecting users worldwide who rely on the service for daily translation needs. Prompt injection attacks remain explicitly excluded from Google’s bug bounty program, meaning security researchers receive no financial incentive to report these flaws through official channels.
The vulnerability highlights a fundamental architectural challenge: the flaw appears to affect Google Translate’s Advanced mode, which rolled out in late 2025 to provide more contextually accurate translations and relies on a large language model’s semantic understanding to interpret meaning across languages—however, this architectural design choice creates the prompt injection attack surface. This incident demonstrates that even core enterprise services can harbor exploitable AI vulnerabilities when LLMs are integrated without adequate isolation between instruction channels and user content.
Framework & Standards Updates
NIST Releases Preliminary Draft of Cyber AI Profile
The Cyber AI Profile (NIST Community Profile) was released as NIST IR 8596 iprd and made available for public comment through January 30, 2026, with the NCCoE hosting a hybrid workshop on January 14, 2026 to discuss the preliminary draft and updates on SP 800-53 Control Overlays for AI Systems.
The preliminary draft of the Cyber AI Profile is organized around three Focus Areas: Secure (securing AI systems); Defend (conducting AI-enabled cyber defense); and Thwart (thwarting adversarial cyberattacks using AI). The core concept of the Cyber AI Profile would be to apply the structure of the CSF 2.0 and the AI RMF to AI specific risks rather than to create a new, separate framework, providing guidelines for managing cybersecurity risk related to AI systems as well as identifying opportunities for using AI to enhance cybersecurity capabilities and leverage AI as a defensive tool.
The preliminary draft proposes to integrate AI-specific considerations across all six core functions of the NIST CSF 2.0, with sample considerations provided for each of the three focus areas, assigned proposed priority levels - “1” for High Priority, “2” for Moderate Priority, and “3” for Foundational Priority.
OWASP Top 10 for Agentic Applications 2026 Released
OWASP Top 10 for Agentic Applications 2026 is a practitioner-oriented security compass for organizations deploying autonomous AI agents, adapting the familiar OWASP Top 10 format to agentic systems that plan, delegate, and act across tools, identities, and other agents, identifying the ten highest-impact agent-specific risks from goal hijacking and tool misuse to rogue agents and cascading failures.
“Agentic AI systems plan, decide, and act across multiple steps and systems. Without strong controls, unnecessary autonomy quietly expands the attack surface and turns minor issues into system-wide failures”. The document includes valuable cross-mappings to the OWASP LLM Top 10, Agentic AI Threats & Mitigations, AIVSS risk scoring, CycloneDX/AIBOM, and Non-Human Identities Top 10, with a dedicated incident tracker grounding the framework in real-world exploits from 2025.
Vulnerability Watch
CVE-2025-32434: PyTorch Remote Code Execution (CVSS 9.3)
A vulnerability registered as CVE-2025-32434 was fixed in PyTorch version 2.6.0, with all previous versions up to 2.5.1 remaining vulnerable. The vulnerability has a CVSS rating of 9.3 (critical) and belongs to the Remote Code Execution (RCE) class, allowing exploitation under certain conditions to run arbitrary code when a malicious AI model is being loaded on the victim’s computer.
The flaw, discovered by security researcher Ji’an Zhou, undermines the safety of the torch.load() function even when configured with weights_only=True—a parameter long trusted to prevent unsafe deserialization, affecting PyTorch versions ≤2.5.1 and carrying a CVSS v4 score of 9.3. There is no evidence that someone is using CVE-2025-32434 in actual attacks, though the very fact of releasing a patch always attracts both researchers and attackers to the problem, so proof-of-concept exploits are most likely already being developed.
The team responsible for developing the PyTorch framework released update 2.6.0, in which CVE-2025-32434 was successfully fixed, with all previous versions remaining vulnerable and requiring updates as soon as possible.
CVE-2026-22778: vLLM ASLR Bypass Vulnerability (CVSS 9.8)
CVE-2026-22778, disclosed on February 2, 2026, is a CRITICAL 9.8 severity Address Space Layout Randomization (ASLR) bypass vulnerability that exists in vLLM versions 0.8.3 to before 0.14.1. vLLM is a widely-used high-throughput inference server for large language models.
This vulnerability represents a concerning trend of critical security flaws emerging in the ML infrastructure stack. Organizations using vLLM for production LLM deployments should prioritize updates to version 0.14.1 or later.
Picklescan Bypass Vulnerabilities in PyTorch Ecosystem
Three CVEs affecting Picklescan—a security tool for scanning PyTorch model files—enable attackers to evade malware detection: CVE-2025-10155 (CVSS 9.3) for file extension bypass, CVE-2025-10156 (CVSS 9.3) for ZIP archive scanning bypass via CRC errors, and CVE-2025-10157 (CVSS 9.3) for unsafe globals check bypass leading to arbitrary code execution.
The issues discovered by JFrog essentially make it possible to bypass the scanner, present the scanned model files as safe, and enable malicious code to be executed, which could then pave the way for a supply chain attack, with each discovered vulnerability enabling attackers to evade PickleScan’s malware detection and potentially execute a large-scale supply chain attack by distributing malicious ML models that conceal undetectable malicious code.
Industry Radar
Anthropic Releases Claude Code Security Tool
Claude Code Security, a new capability built into Claude Code on the web, is now available in a limited research preview. It scans codebases for security vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix security issues that traditional methods often miss. The release follows Anthropic’s February 2026 release of Claude Opus 4.6, followed by Sonnet 4.6.
Cybersecurity software stocks fell last week after Anthropic unveiled the security feature, with Bloomberg characterizing it as the latest example of software shares dropping due to worries about competition from AI firms, as the tool “scans codebases for security vulnerabilities and suggests targeted software patches for human review,” prefiguring a decline in companies such as Cloudflare.
OpenAI Introduces GPT-5.3-Codex with Enhanced Cybersecurity Capabilities
OpenAI introduced GPT-5.3-Codex on February 5, 2026, as the most capable agentic coding model to date, advancing both frontier coding performance and reasoning capabilities. It’s the first model OpenAI classifies as “High capability” for cybersecurity-related tasks under its Preparedness Framework, and the first directly trained to identify software vulnerabilities.
Over recent months, OpenAI has seen meaningful gains in model performance on cybersecurity tasks benefiting both developers and security professionals, and has been preparing strengthened cyber safeguards to support defensive use. While there is no definitive evidence the model can automate cyber attacks end-to-end, OpenAI is taking a precautionary approach and deploying its most comprehensive cybersecurity safety stack to date, including safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines including threat intelligence.
OpenAI Introduces Lockdown Mode and Elevated Risk Labels
OpenAI rolled out Lockdown Mode for high-security users and introduced Elevated Risk labels across ChatGPT, Atlas, and Codex to flag features with higher risk. These protections include Lockdown Mode in ChatGPT, an advanced, optional security setting for higher-risk users, and “Elevated Risk” labels for certain capabilities that may introduce additional risk.
Lockdown Mode is a new deterministic setting that helps guard data from being inadvertently shared with third parties by tightly constraining how ChatGPT can interact with certain external systems, and is available for ChatGPT Enterprise, ChatGPT Edu, ChatGPT for Healthcare, and ChatGPT for Teachers.
Major AI Infrastructure Partnerships Announced
Several strategic partnerships reshape the AI infrastructure landscape:
- Meta announced plans to build hyperscale data centers optimized for both training and inference, with the partnership enabling large-scale deployment of NVIDIA CPUs and millions of NVIDIA Blackwell and Rubin GPUs
- Rackspace Technology and Palantir Technologies announced a strategic partnership on February 18, 2026 to help enterprises rapidly deploy Palantir’s Foundry and AIP platforms, with Rackspace’s governed operating model providing consistent security and compliance from edge to core to cloud
- Mistral AI made its first acquisition by buying Koyeb, a Paris-based startup that simplifies AI app deployment at scale, confirming Mistral’s ambitions to position itself as a full-stack player. In June 2025, Mistral had announced Mistral Compute, an AI cloud infrastructure offering which it now hopes Koyeb will accelerate
Policy Corner
South Korea Enacts World’s First Comprehensive AI Laws
South Korea has made history by becoming the world’s first country to officially enact comprehensive laws specifically regulating artificial intelligence on February 2, 2026, marking a pivotal moment in global AI governance. This proactive legislative action provides a foundational legal framework designed to manage the multifaceted risks associated with AI, including data privacy, algorithmic bias, and security vulnerabilities within a national context, setting an important precedent for other nations.
Singapore Launches Model AI Governance Framework for Agentic AI
Singapore unveiled a pioneering Model AI Governance Framework specifically designed for Agentic AI on February 13, 2026, providing comprehensive guidance for the responsible development, deployment, and use of increasingly autonomous AI systems. For AI security, this is a significant development as it proactively addresses the unique challenges of agentic AI, such as managing emergent behaviors, ensuring accountability, and implementing safeguards against unintended consequences.
Colorado AI Act Implementation Delayed
The Colorado Artificial Intelligence Act (CAIA) requires risk management for AI-driven decisions in employment, housing, and healthcare and will be implemented as of June 30, 2026 (delayed from February 1, 2026). The delay provides organizations additional time to achieve compliance with the first comprehensive state-level AI law in the United States.
U.S.-India AI Opportunity Partnership Announced
On February 20, 2026, the governments of the United States and India acknowledged a shared vision for their innovation ecosystems through a Joint Statement on the U.S.-India AI Opportunity Partnership, a bilateral addendum to the Pax Silica Declaration, recognizing that the 21st century is likely to be defined by the physical backbone for artificial intelligence—from critical minerals and energy to compute and semiconductor manufacturing.
Research Spotlight
Prompt Injection and Jailbreaking Research Advances
Multiple significant papers on LLM security were published this week:
A paper titled “The Vulnerability of LLM Rankers to Prompt Injection Attacks” was submitted to arXiv on February 18, 2026, presenting a comprehensive empirical study of jailbreak prompt attacks against LLM rankers. Recent research showed that simple prompt injections embedded within candidate documents can significantly alter an LLM’s ranking decisions, and the paper focused evaluation on two complementary tasks: Preference Vulnerability Assessment measuring intrinsic susceptibility via attack success rate (ASR), and Ranking Vulnerability Assessment quantifying operational impact on ranking quality.
Anthropic’s Claude Opus 4.6 system card, released February 5 with 212 pages, breaks out attack success rates by surface, by attempt count, and by safeguard configuration, showing that when running a prompt injection attack in a constrained coding environment, it fails every time with a 0% success rate across 200 attempts requiring no safeguards.
A systematic evaluation titled “Red Teaming the Mind of the Machine” evaluated over 1,400 adversarial prompts across four LLMs: GPT-4, Claude 2, Mistral 7B, and Vicuna, analyzing results along several dimensions including model susceptibility, attack technique efficacy, prompt behavior patterns, and cross-model generalization.
Adversarial Machine Learning Survey Papers
A comprehensive survey paper published in May 2025 reviews the Adversarial Machine Learning (AML) landscape in modern AI systems, focusing on the dual aspects of robustness and privacy, initially exploring adversarial attacks and defenses using comprehensive taxonomies, and subsequently investigating robustness benchmarks alongside open-source AML technologies and software tools that ML system stakeholders can use to develop robust AI systems.
What This Means For You
Prepare for AI-augmented attacks now. The FortiGate compromise demonstrates that AI-enabled attacks have moved from theoretical to operational reality. Security teams should assume adversaries are already using commercial AI services to accelerate reconnaissance, vulnerability discovery, and exploit development. Conduct tabletop exercises simulating AI-augmented adversaries who can operate at machine speed and scale.
Implement AI-specific security controls immediately. The Google Translate prompt injection and ongoing jailbreak research show that traditional security controls don’t adequately protect AI systems. Adopt the NIST Cyber AI Profile framework as it matures, focusing first on the “Secure” pillar: input validation, output filtering, and strict separation between instruction channels and user content. Deploy LLM firewalls or API gateways with prompt injection detection before putting AI agents into production.
Audit your ML supply chain. The CVE-2025-32434 PyTorch vulnerability and Picklescan bypass flaws expose how ML frameworks and model distribution systems can be weaponized. Inventory all ML frameworks in your environment, verify model provenance before loading, and consider running models in sandboxed environments with restricted filesystem access. If you’re using PyTorch, upgrade to version 2.6.0 immediately. Treat model files with the same scrutiny as executable binaries—they are code, not data.
Establish governance for agentic AI deployments. With the release of the OWASP Top 10 for Agentic Applications, organizations have a concrete checklist for securing autonomous agents. Before deploying any agentic AI system, map its capabilities to the OWASP framework, implement least-privilege access for tool usage, and establish audit trails for all agent actions. The Singapore and South Korea regulatory frameworks signal that governance requirements are coming—get ahead of them now.
Balance innovation with precaution on advanced capabilities. OpenAI’s introduction of “Elevated Risk” labels and Lockdown Mode reflects the industry recognizing that some AI capabilities require additional controls. Evaluate whether your organization needs similar tiered access controls for AI tools, especially for roles handling sensitive data or having broad system access. Consider whether high-value targets in your organization (executives, security teams) should operate under restricted AI profiles similar to OpenAI’s Lockdown Mode.
Tools and Resources
NIST Cyber AI Profile (IR 8596 iprd) — Preliminary draft providing AI-specific considerations across NIST CSF 2.0 functions. Available for review with focus areas for securing AI systems, defending with AI, and thwarting AI-enabled attacks.
OWASP Top 10 for Agentic Applications 2026 — Practitioner-focused framework identifying highest-impact risks for autonomous AI agents. Includes cross-mappings to LLM Top 10 and real-world incident tracker.
International AI Safety Report 2026 — Comprehensive assessment of general-purpose AI capabilities and risks from 100+ international experts. Essential reading for understanding the current state of AI safety research.
Claude Code Security — Anthropic’s limited research preview tool that scans codebases for security vulnerabilities and suggests patches. Represents the emerging class of AI-powered security tools.
PyTorch 2.6.0 Security Update — Addresses CVE-2025-32434 (CVSS 9.3) RCE vulnerability in model loading. All organizations using PyTorch should upgrade immediately from versions ≤2.5.1.