AI Surpasses Autonomous Cyber Capability Benchmarks

AI Models Exceed Autonomous Cyber Capability Benchmarks

Two advanced artificial intelligence models, Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5, have substantially exceeded the already-accelerating pace at which AI systems are completing autonomous cybersecurity tasks, according to separate findings published by the United Kingdom's AI Security Institute (AISI) and Palo Alto Networks.

The AISI, which conducts pre-deployment evaluations of frontier AI models on behalf of the British government, said both Claude Mythos Preview and GPT-5.5 have substantially exceeded the doubling trend the institute had been tracking since late 2024. The AISI estimated earlier this year that frontier models' 80% reliability cyber time horizon had been doubling approximately every five months.

Capability Jump

Whether the results represent an isolated capability jump or the start of a new, faster trajectory remains unclear. The clearest evidence of the capability jump came from the AISI's cyber ranges, its structured simulations of multi-stage attacks against small, undefended enterprise networks. A newer checkpoint of Claude Mythos Preview became the first model to complete both of the institute's ranges.

It solved The Last Ones, a 32-step simulated corporate network attack, in 6 of 10 attempts, and completed Cooling Tower — previously unsolved by any model — in 3 of 10 attempts. GPT-5.5 solved The Last Ones in 3 of 10 attempts.

Palo Alto Networks' Findings

Palo Alto Networks reached similar conclusions through its own testing. The company said it began testing Claude Mythos in April as a launch partner for Anthropic's Project Glasswing, and has since tested Claude Opus 4.7 and OpenAI's GPT-5.5-Cyber as part of OpenAI's Trusted Access for Cyber program.

The company released security advisories covering 26 CVEs representing 75 issues — compared to a typical monthly volume of fewer than five CVEs — that were identified through AI model scanning across more than 130 products. All important vulnerabilities in its SaaS products had been patched, with patches available for all customer-operated products.

Implications and Recommendations

Palo Alto Networks outlined four immediate priorities for enterprises as these models continue to grow in usage: First, find and fix vulnerabilities in code and applications before attackers do. Second, shrink the attack surface and use AI to spot security misconfigurations. Third, deploy detection and response tools across all systems, using machine learning to catch threats in real time. Fourth, build security operations fast enough to respond in minutes, because AI-powered attacks may soon unfold that quickly.

The AISI said it is developing more demanding evaluations, including new cyber ranges and the addition of active cyber defenses, to better reflect real-world conditions as model capabilities continue to advance.

Frontier AI's autonomous cyber and software capability is advancing quickly: the length of cyber tasks that frontier models can complete autonomously has doubled on the order of months, not years.
No single benchmark result should be read as a precise measure of AI capability: regardless, the direction of change and rapid growth have been consistent across the models, methodological choices and independent data examined.

Separate research from METR, a nonprofit that tracks how quickly AI handles software tasks, arrived at a nearly identical figure — a doubling time of approximately four months since late 2024.

The latest models are extraordinarily capable at finding vulnerabilities and changing them into critical exploit paths in near-real-time.

As AI models continue to grow in usage, it is essential for enterprises to prioritize their security operations and respond quickly to potential threats.

Source: CyberScoop

Source: CyberScoop

AI Surpasses Autonomous Cyber Capability Benchmarks

AI Models Exceed Autonomous Cyber Capability Benchmarks

Capability Jump

Palo Alto Networks' Findings

Implications and Recommendations

Powered by ZeroBot

Related Articles

Tech Giants Back Open-Source AI for Safer Ecosystem

Duplicate Cybersecurity Regulations Abound

Cisco Antares AI Models