Anthropic's Mythos AI Can Write Zero-Day Exploits — But Can It Be Kept Safe?

A New AI Model That Thinks Like an Attacker

On April 7, 2026, Anthropic unveiled Claude Mythos Preview, a general-purpose large language model (LLM) that the company described in a blog post as one that "performs strongly across the board, but it is strikingly capable at computer security tasks." The announcement has sent ripples through the cybersecurity community, raising urgent questions about who gets access to a tool this powerful — and what happens when it ends up in the wrong hands.

According to Anthropic, Mythos Preview can identify and exploit zero-day vulnerabilities in every major operating system and every major web browser at a user's direction, including subtle, difficult-to-detect flaws. The company noted that one exploit involved a patched flaw in OpenBSD that was 27 years old.

What Mythos Can Actually Do

The technical demonstrations Anthropic shared are striking. In one documented case, Mythos Preview wrote a web browser exploit that chained together four separate vulnerabilities, producing a complex JIT heap spray that successfully escaped both renderer and OS sandboxes. In other examples, the model:

Autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR bypasses.
Wrote a remote code execution exploit targeting FreeBSD's NFS server, granting full root access to unauthenticated users by splitting a 20-gadget ROP chain across multiple packets.

Anthropic was careful to note that a user does not need to be a security engineer to prompt the model effectively, which significantly lowers the barrier to entry for potential misuse.

Notably, these exploit-writing capabilities were not a deliberate engineering goal. Instead, they emerged as a downstream consequence of improvements to Mythos' general code and reasoning abilities. As Anthropic put it: "The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them."

Project Glasswing: A Defensive Counter-Initiative

In apparent anticipation of the dual-use concerns surrounding Mythos, Anthropic simultaneously launched Project Glasswing, a new initiative developed in partnership with Apple, AWS, Microsoft, Palo Alto Networks, and CrowdStrike. The company described Project Glasswing as an effort that could fundamentally "reshape cybersecurity" and called it "an urgent attempt to put these capabilities to work for defensive purposes."

As part of the initiative, Anthropic has extended Mythos Preview access to a group of more than 40 organizations to scan and secure first-party and open source systems. The company is also committing $100 million in Mythos Preview usage credits to Project Glasswing, along with $4 million in direct donations to open source security organizations.

Lee Klarich, chief product and technology officer at Palo Alto Networks, described early Mythos Preview results as "compelling" in a LinkedIn blog post. Anthropic also claimed it has already identified "thousands" of high-risk and critical security vulnerabilities through the model, which it says it is responsibly disclosing.

Why Anthropic Built This — and Why It Matters

Erik Nost, senior analyst at Forrester, told Dark Reading there are two reasons Anthropic moved forward with this release. First, it is good public relations for the company, essentially signaling that its AI is advanced enough to transform both cybersecurity and software development. Second, the announcement draws long-overdue attention to vulnerability detection gaps that the industry has struggled with for roughly 30 years.

Nost framed the release as a call to action for defenders. "It's a call to action, a heads-up, to defenders that vulnerability management practices are about to get very different," he said. He also characterized the current situation as "a race [for defenders] to remediate and patch before other AIs, in the wrong hands, discover these zero-days and rapidly write exploits."

Can Controls Really Keep This Tool Away From Attackers?

The parallel to tools like Cobalt Strike is hard to ignore — legitimate penetration testing frameworks that threat actors have long co-opted for malicious campaigns. The concern is that a model as capable as Mythos Preview could be similarly abused.

Julian Totzek-Hallhuber, senior principal solution architect at Veracode, argues that since there is no clear answer on how to prevent capable AI tools from proliferating into attacker hands, defenders should simply assume that they will. His recommended posture involves:

Investing in detection rather than relying solely on prevention.
Identifying the behavioral signatures of AI-assisted exploitation.
Adopting zero-trust architecture.
Implementing aggressive patching cycles and anomaly-based detection.

Melissa Ruzzi, director of AI at AppOmni, offered a frank assessment to Dark Reading: "No one can ever keep anything 100% out of attackers' hands. The best that can be done is to make it more difficult for them to get access to it."

Healthy Skepticism Is Still Warranted

Despite the impressive claims, there are reasons to pump the brakes. Totzek-Hallhuber points out that the early examples Anthropic shared, however compelling, represent just two data points — and two data points do not constitute a verified pattern.

More critically, he notes that "Anthropic controls both the model and the narrative; independent replication is impossible when the model isn't publicly available." Because Mythos Preview is only accessible to a restricted group of partners, outside researchers cannot run their own evaluations. That means the claims can neither be fully trusted nor refuted until independent verification is possible.

Dark Reading reached out to Anthropic requesting statistics on false positive rates and error rates. The company had not responded by press time.

What Defenders Should Do Now

The introduction of Mythos Preview — regardless of how tightly Anthropic tries to gate it — signals that the vulnerability management landscape is entering a new phase. AI-driven exploitation is no longer a theoretical risk; it is a demonstrated capability. Organizations that have deferred investments in detection engineering, zero-trust frameworks, and rapid patching programs may find themselves dangerously exposed as models like Mythos become more widely available, whether by design or by leak.

The broader industry challenge is not just technical. It is also about governance: who decides which organizations get access to systems this powerful, on what terms, and with what accountability mechanisms in place. For now, Anthropic's answer is Project Glasswing — but whether a $100 million credit program and a consortium of 40 partners is enough to outpace a determined adversary remains an open question.

Source: Dark Reading

Source: Dark Reading

Anthropic's Mythos AI Can Write Zero-Day Exploits — But Can It Be Kept Safe?

A New AI Model That Thinks Like an Attacker

What Mythos Can Actually Do

Project Glasswing: A Defensive Counter-Initiative

Why Anthropic Built This — and Why It Matters

Can Controls Really Keep This Tool Away From Attackers?

Healthy Skepticism Is Still Warranted

What Defenders Should Do Now

Powered by ZeroBot

Related Articles

Fortinet Rushes Emergency Fix for Actively Exploited FortiClient EMS Zero-Day CVE-2026-35616

Anthropic Claude Code Leak and Supply Chain Attacks Expose Critical Pipeline Weaknesses

Microsoft's March 2026 Patch Tuesday: 77 Fixes, AI-Discovered CVE Among Highlights