If a model is so dangerous it can't be publicly released — but someone accesses it anyway within hours of its announcement — was the restriction ever actually about safety?

That question became unavoidable this week as Anthropic confirmed it is investigating unauthorised access to Claude Mythos Preview, its restricted-access AI model capable of autonomously identifying and exploiting zero-day vulnerabilities in virtually every major operating system and web browser. On April 7, the same day Anthropic announced Mythos and its Project Glasswing consortium of 40 vetted organisations, a small group gained access through a third-party vendor environment — reportedly guessing the model's URL based on familiarity with Anthropic's past URL conventions. They have had access ever since. What looked like responsible AI development looks, in retrospect, like security theatre with a press release attached.

"The dangers of getting this wrong are obvious, but if we get it right, there is a real opportunity to create a fundamentally more secure internet and world than we had before the advent of AI-powered cyber capabilities." — Dario Amodei

I don't doubt Amodei's sincerity. The Glasswing model — curating access among defenders, patching vulnerabilities proactively before broader capability diffusion — is the right instinct. Mythos found a 27-year-old OpenBSD vulnerability, a 17-year-old FreeBSD remote code execution bug that grants complete root access to any machine running NFS, and thousands more critical flaws across major software systems. The theory behind restricted release is sound: give defenders a head start, patch the worst vulnerabilities before adversaries can weaponise the same capabilities. Anthropic even brought Amodei to the White House to brief policymakers on the risks.

The real issue: The threat model behind "controlled access" assumed that keeping a dangerous AI model slightly out of reach was meaningful security. Mythos proved it isn't — the defence window lasted zero days.

The strongest defence of controlled release goes like this: even if some unauthorised users slip through, restricting access slows proliferation. Better to have 40 defenders and a handful of bad actors than to open the floodgates entirely. But this breach isn't a leak from a small hole — it's a structural failure. The unauthorised group has had continuous access for nearly three weeks, sharing live demonstrations and screenshots with journalists. They haven't launched attacks, but the point isn't their intentions. The point is that once Mythos existed, restricting it became a management problem, not a safety solution. And management problems are solvable — until they aren't.

The deeper issue is that the entire "responsible release" framing implicitly accepts the premise that the model had to be built in this form. It didn't. Anthropic made a deliberate choice to train a model to autonomous offensive cybersecurity capability, then decided access control would manage the risk. That sequencing is backwards. The hard question — one that no one in the Project Glasswing press coverage has seriously asked — is whether training a model to autonomously chain exploits across systems was the responsible decision in the first place, regardless of who you allow to use it afterwards.

Amodei is right that more powerful models are coming. But the lesson of Mythos isn't how to run better restricted-access programmes. It's that the industry needs an honest debate about which capabilities should be trained at all — because once a capability exists, containment is a race you're already losing.