The Model Was Too Dangerous to Release. Today It Was Stolen.

There is no such thing as an AI model too dangerous to release — there are only models too dangerous to lose control of. Anthropic drew that distinction for us today, and the difference is not semantic.

When Anthropic announced it had built Claude Mythos — a model capable of autonomously discovering and exploiting zero-day vulnerabilities in every major operating system and web browser — and chose not to release it publicly, the AI safety community breathed a collective sigh of relief. Here was a frontier lab exercising genuine restraint. Here was proof that the responsible development ethos Anthropic preaches is more than a marketing line. I thought so too. But today's reports that hackers have breached the systems housing Mythos should force us to ask an uncomfortable question: if the capability exists and can be stolen, what exactly did the non-release decision achieve?

The stakes are not abstract. Mythos Preview, deployed in restricted form through Project Glasswing — Anthropic's initiative giving controlled access to AWS, Apple, Google, JPMorgan, Microsoft, and Nvidia — discovered thousands of zero-day vulnerabilities across the world's most critical software infrastructure. Bugs lurking undiscovered for years. The oldest: a 27-year-old flaw in OpenBSD, a system known precisely for its rigorous security focus. More than 99% of those vulnerabilities remain unpatched today.

Anthropic's Mythos Preview identified thousands of critical zero-day vulnerabilities — including flaws in every major operating system and web browser — then declined to publish details while partners raced to patch them. Today, hackers reportedly declined to wait.

This is the contradiction at the heart of Anthropic's strategy. The company decided Mythos was too dangerous to release, but not too dangerous to build, train, and deploy in restricted settings. Those are not equivalent levels of control. A model deployed — even to a handful of vetted partners — is a model that can be exfiltrated. The choice was never between "release" and "safety." It was between "release with transparency" and "restrict with hope."

The real issue: Withholding a dangerous model without simultaneously accelerating remediation of the vulnerabilities it found is not a safety strategy — it is a liability strategy with a countdown clock.

The strongest defence of Anthropic's approach is that imperfect delay is still better than immediate, unrestricted release. Restricting access bought time — time for Project Glasswing partners to patch critical vulnerabilities before bad actors could exploit them at scale. That argument has genuine merit. Coordinated vulnerability disclosure is an established, proven practice in security, and Anthropic deserves credit for treating this capability with the gravity it warrants.

But coordinated disclosure only works if disclosure happens before exploitation does. Today's breach collapses that assumption. The attackers did not read the Project Glasswing timeline. They did not wait for Anthropic's orderly handoff. The 99% of vulnerabilities that remain unpatched are now exposed not because Anthropic released Mythos — but because Anthropic's containment failed anyway, without any of the safeguards a controlled, transparent release might have put in place.

The lesson is not that Anthropic was reckless to build Mythos, nor naive to withhold it. The lesson is that for models at this capability level, "we won't release it" cannot be the endpoint of a safety strategy — it can only be the beginning. Every frontier lab now building at this capability tier needs a plan for the moment containment fails. Today, the industry got the answer to that question before anyone had bothered to ask it. It is long past time to ask it.