A Coordinated Reckoning With AI-Powered Offensive Capabilities
Two major cybersecurity assessments — one from a coalition of leading American institutions and another from the United Kingdom's AI Security Institute (AISI) — have arrived at broadly similar and sobering conclusions about the offensive potential of Anthropic's Claude Mythos large language model. Together, they represent among the most authoritative early responses from the Western cybersecurity establishment to a new generation of AI systems capable of autonomous hacking.
The US Joint Report: Asymmetric Advantage for Attackers
A joint report authored by the Cloud Security Alliance (CSA), the SANS Institute, and the Open Worldwide Application Security Project (OWASP) concludes that in the near term, organizations are "likely to be overwhelmed" by threat actors using AI to find and exploit vulnerabilities faster than defenders can patch them. Although defenders can deploy the same AI tools, the report argues that attackers still face a lighter relative burden because of the inherent limitations of the patching process.
This imbalance produces what the authors describe as "asymmetric benefits" for attackers, who can move fast and take risks that large, compliance-bound enterprises simply cannot. The three primary authors — Robert Lee, SANS Institute's Chief AI Officer; Gadi Evron, CEO of Knostic; and Rich Mogull, chief analyst at CSA — put the problem in stark terms:
"The cost and capability floor to exploit discovery is dropping, the time between disclosure and weaponization is compressing toward zero, and capabilities that previously required nation-state resources are now becoming broadly accessible."
High-Profile Contributors
The report draws credibility from an extraordinary list of contributors. Former CISA Director Jen Easterly, former top White House and NSA cybersecurity official Rob Joyce, and former National Cyber Director Chris Inglis all participated as contributing authors. From the private sector, the roster includes Heather Adkins, Google's CISO; Katie Moussouris, CEO of Luta Security; and Sounil Yu, Chief Technology Officer at Knostic. An additional seventy CISOs, CTOs, and other security executives are credited as editors and reviewers.
The report's authors call on organizations to accelerate the adoption of AI for cyber defense while simultaneously overhauling incident response playbooks and corporate policies to accommodate increasingly automated defensive postures.
What the UK's AISI Found When Testing Claude Mythos
Separately, the UK's AI Security Institute detailed the results of tests it conducted on a preview version of Claude Mythos, describing the model as a "step up" from previous Anthropic releases in the cybersecurity domain. Researchers found it capable of executing multi-stage attacks on vulnerable networks and discovering and exploiting vulnerabilities autonomously.
Capture the Flag: A Historic Benchmark
Using a combination of Capture the Flag (CTF) exercises and cyber range testing, AISI researchers made a striking finding: before April 2025, no large language model had successfully completed a single expert-level CTF problem. Claude Mythos solved nearly three quarters — 73% — of them.
The implications go beyond raw performance. Researchers found that Mythos raised the ceiling for technical non-experts and apprentice-level users while simultaneously narrowing the overall gap in hacking proficiency between amateurs and mid-level practitioners. Put plainly, the traditional distinction between so-called "script kiddies" and hackers with genuine technical knowledge is beginning to erode.
Cyber Range Testing: Meaningful Progress, Uneven Results
In cyber range exercises — designed to simulate the more complex, multi-stage attacks seen against real organizations — results were uneven but nonetheless represented meaningful progress over prior Claude models. Mythos was subjected to a 32-step attack playbook modeled on corporate network environments, spanning initial network access all the way to full network takeover.
In three of the ten simulations, the model completed an average of 24 of the 32 steps. By comparison, older versions of Claude and other frontier models never averaged more than 16 steps. This represents a substantial leap in autonomous attack capability.
Mythos did fail its test against a simulated operational technology (OT) cooling tower, but AISI researchers were careful to note that this does not indicate AI is generally poor at exploiting OT environments. The model actually faltered during the IT-side portion of the exercise, not the OT-specific components.
Caveats and Limitations
UK researchers were measured in their conclusions. They noted that their cyber ranges lack security features common in real-world networks — including active defenders, defensive tooling, and penalties for triggering security alerts. As a result, they stated they "cannot say for sure whether Mythos Preview would be able to attack well-defended systems." They assessed it as being "at least capable" of autonomously compromising smaller, weakly defended enterprise networks.
Technical Debt: A Decade of Neglect Coming Due
Both the US and UK assessments agree that large language models are broadly driving the same outcome: a lower technical barrier to entry for offensive cyber operations. But some experts warn the problem runs even deeper than current patching backlogs.
Casey Ellis, CTO and founder of Bugcrowd, argued that recent AI cyber advances have succeeded largely by "living in the places we stopped looking a decade ago." While the security community spent years concentrating on application security, vulnerability triage, and other high-visibility problems, sophisticated attackers and AI tools have been exploiting vulnerabilities in forgotten firmware and routers made by manufacturers that have long since gone out of business.
The ability of tools like Claude Mythos to endlessly weaponize the accumulated technical debt of large organizations has transformed the classic defender's dilemma. Ellis described it as taking "the knob that used to go to ten and turned it to seven hundred."
Ellis also highlighted the structural disadvantages facing corporate and government defenders. Large institutions run on consensus-building, layered hierarchies, and legal compliance requirements — all of which are necessary when deploying automated security tooling, but which inevitably slow response times and widen the asymmetry against defenders in the short term.
"Integration into actual production becomes the battlezone. Lag is real. Bureaucracy is real. Supply chains are real."
Anthropic's Response and Project Glasswing
Anthropic has stated it is not selling Claude Mythos commercially. The week preceding the publication of these reports, the company announced the model would be made available to Project Glasswing, a consortium of major technology companies that intends to use it to identify and patch vulnerabilities in widely used products and services — a defensive application of the same capabilities that have drawn concern from security researchers on the offensive side.
The Road Ahead
The convergence of these two assessments — one rooted in American policy circles and one from a British government testing body — signals a growing institutional urgency around AI-enabled cyber threats. The window for organizations to get ahead of this shift is narrowing. The unanimous recommendation from both reports is that defenders must move faster, invest more in AI-assisted security tooling, and honestly reckon with the years of technical debt that have left so many systems exposed.
Source: CyberScoop