AI agent security got real: Claude Mythos found thousands of zero-days in decades-old code. Here's the vault-first playbook for SMEs deploying AI agents.

After Mythos: A Practical Guide to AI Agent Security for SMEs

Two days ago, an AI model that has never been made publicly available found thousands of zero-day vulnerabilities across every major operating system and web browser. Among them: a 27-year-old flaw in OpenBSD that had survived decades of human review, and a 16-year-old bug in FFmpeg that had been hit by roughly five million automated test runs without ever being caught.

The model is called Claude Mythos Preview. Anthropic announced it on 7 April 2026 as part of Project Glasswing. It is the moment AI agent security stopped being a theoretical problem for critical-infrastructure teams and became a practical problem for everyone running an AI agent inside a normal business.

If you run a 10- to 40-person company and you've already deployed, or are about to deploy, an AI agent inside your business, you have a decision to make this week. Not "should I be scared?" That's not the right question. The right question is: is your AI agent living in a vault, or is it living on trust?

This post gives you three things. First, a plain-English explanation of what just happened with Mythos and Glasswing. Second, the 24-month run-up that tells you this wasn't actually a surprise. Third, the "vault, not trust" playbook we use at TecMinds for every AI agent we ship, including a five-point checklist you can walk into Monday morning's team meeting with.

If you'd rather talk it through for your specific setup, the free 30-minute AI Potenzial-Check is the fastest way.

What Just Happened: Claude Mythos and Project Glasswing, in Plain Language

What is Project Glasswing?

Project Glasswing is an industry initiative Anthropic announced on 7 April 2026 to defend critical infrastructure against AI-discovered vulnerabilities. It pairs an unreleased frontier model called Claude Mythos Preview with 12 launch partners and more than 40 additional critical-infrastructure organizations, backed by $100M in Anthropic usage credits.

The 12 launch partners are AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Anthropic is also contributing $4M in direct donations to open-source security organizations: $2.5M to Alpha-Omega and OpenSSF, and $1.5M to the Apache Software Foundation.

The name is a reference to the glasswing butterfly (Greta oto), whose transparent wings let it hide in plain sight. The metaphor is the whole point: hidden vulnerabilities, transparent defense.

What is Claude Mythos Preview?

Claude Mythos Preview is an unreleased frontier AI model built by Anthropic specifically for finding software vulnerabilities. Anthropic has deliberately declined to make it generally available; access is limited to Project Glasswing partners via the Claude API, Amazon Bedrock, Google Cloud Vertex AI, or Microsoft Foundry.

On published benchmarks, Mythos is a substantial jump over Claude Opus 4.6:

What did it actually find?

According to Anthropic's Mythos Preview system card, the model has already found thousands of zero-day vulnerabilities across every major operating system and browser. The examples Anthropic has disclosed include:

A 27-year-old remote-crash flaw in OpenBSD
A 16-year-old vulnerability in FFmpeg that had been hit by approximately 5 million automated test runs without being detected
Privilege-escalation chains in the Linux kernel
A 17-year-old remote code execution bug in FreeBSD (CVE-2026-4747) that allows root access via NFS, which Mythos identified and exploited autonomously with no human guidance after the initial request

A direct quote from Anthropic's announcement:

AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.

That sentence is the one you need to sit with.

Why AI Agent Security Is No Longer a Security Team Problem

If you run a SOC, you already know why this matters. The interesting question is why it matters for a 25-person firm whose "security team" is one person who also handles IT and onboarding.

It matters because of two things you probably haven't thought about.

The first is asymmetry. Right now, Mythos-class capability exists only inside Anthropic and its 40+ approved partners. The model isn't public. The weights haven't leaked.

But security researchers, including Simon Willison whose opinion on this kind of thing is worth reading, estimate roughly six months before open-weight models catch up on bug-finding. When that happens, the same capability that lets defenders patch 27-year-old OpenBSD bugs lets attackers discover new ones at the same speed. The Cloud Security Alliance is calling it "the Vulnpocalypse", slightly dramatic but not wrong.

The second is your own AI agent. This is the part almost nobody is writing about.

Every AI agent you deploy has two properties that matter here:

It has tools. It can call your APIs, read your files, send emails, touch your CRM, move money.
It has a prompt surface. It takes instructions from humans, sometimes untrusted humans: a customer, an email, a scraped web page.

Put those two together and your agent is exactly the kind of system an attacker with Mythos-class capability would love to reach. Not to attack the model, to attack through it. The model is the path; your tools are the target.

Picture a 22-person recruiting firm running an "AI candidate assistant": an LLM agent that reads incoming CVs, scores them against open roles, drafts rejection emails, and updates the ATS. A real setup we've seen at three firms this year.

The agent has read access to the candidate database, write access to the ATS, send access to the email server, and an incoming channel where candidates themselves can submit "clarifications" to their applications. Every one of those is a tool the model can call. Every one of those is a tool an attacker could reach if they can get text into the candidate-clarification channel.

The model itself is fine. The problem is the blast radius.

The question is not "can I trust Claude?" The question is: what is the maximum damage if my AI agent is wrong, manipulated, or compromised? If the answer is "unclear" or "a lot", you have work to do.

This Wasn't a Surprise: The 24-Month Run-Up

Mythos feels like a shock. It shouldn't. The trajectory that brought AI agent security to this point has been visible for two years.

October 2024: Google's Big Sleep finds a real bug in SQLite. Google Project Zero and DeepMind ran Project Naptime, later renamed Big Sleep. Their LLM agent found a stack buffer underflow in SQLite, one that OSS-Fuzz and SQLite's own internal testing infrastructure had missed. The maintainers fixed it the same day.

It was reported before it hit an official release, so no SQLite users were affected. But it was the first public example of an AI agent finding a previously-unknown exploitable memory-safety bug in widely-used real-world software.

August 2025: DARPA's AI Cyber Challenge finals. At DEF CON, seven teams competed to build fully automated systems that could find and patch vulnerabilities in open-source software. According to DARPA, the teams discovered 86% of the synthetic vulnerabilities planted in the codebases and patched 68%.

Their systems uncovered 18 previously-unknown real-world flaws. Average time from "found" to "patched": 45 minutes. Team Atlanta won $4M. Trail of Bits' Buttercup system won $3M.

Anthropic, Google, and OpenAI each donated $350k in LLM credits to the competitors, which means the model providers were actively learning from the winning patterns.

April 2026: Mythos. Thousands of zero-days in decades-old code, autonomously discovered and in some cases autonomously exploited. The pattern every 12 months is the same: what took a research team now fits in a prompt. The next 12 months will not be slower. They will be faster.

The Vault, Not Trust, Playbook for AI Agent Security

Here is the framework we use at TecMinds for every AI agent we ship. It wasn't invented for Mythos. We've been building this way since before Project Glasswing existed, because it was the right architecture for any system whose behavior is shaped by a language model instead of deterministic code. After Mythos, it is the only defensible default.

Why "Trust, Then Verify" Fails for AI Agents

Traditional software is bounded. A function takes known inputs, returns known outputs, has a known failure mode. You can trust-then-verify because the set of things the function can do is finite and you wrote all of them yourself.

An AI agent is different. Its set of possible actions is whatever tools you hand it, interpreted by a model whose next output you cannot predict. A powerful model plus a wide toolset plus an ambiguous prompt equals an undefined blast radius. You cannot audit your way out of undefined.

So we invert the default. We do not ask "can I trust this model?" We ask: what is the maximum damage if the model is wrong, and how do we make that damage small? The answer is a vault.

The Five AI Agent Guardrails Every Deployment Needs

These five controls are the minimum bar for AI agent security in a production deployment. Skip any one of them and the rest get weaker.

Least-privilege tool access. Start with zero tools and add them one at a time. For each tool, write down the answer to "what's the worst this can do if it's called wrong?" If you cannot answer that sentence, the tool is not ready.
Scoped secrets. API keys are scoped to a single integration, rotated on a schedule, and never loaded into the model's context. The tool holds the secret; the model calls the tool. Your Stripe key never touches a prompt.
A deterministic policy layer. Between the model and the tool sits a rules engine the model cannot negotiate with. The model asks to send CHF 10,000, policy says "amounts over CHF 1,000 need approval", the request is rejected before the tool runs. This is code, not another LLM. It does not hallucinate and it does not get talked out of its job.
Human-in-the-loop for destructive actions. Any action that writes, sends, deletes, or spends money goes through a human. Not a toast notification, an actual gate that blocks the action until someone says yes. This isn't a feature we add; it's how we architect. Our approach to building AI agents calls this "human in the loop, by design." It's also the honest answer to "what if it hallucinates?" If the worst thing a hallucination can do is propose something a human then rejects, hallucinations become a non-issue.
Full audit trail plus kill-switch. Every prompt, every tool call, every response, every decision: logged, timestamped, replayable. And one button that takes the whole agent offline. Without an audit trail, you cannot answer "what did the agent do yesterday?" Without a kill-switch, you cannot make it stop.

What a Vault Actually Looks Like

Real containment, not aspirational containment:

Network egress whitelist. The agent can reach only the APIs you explicitly allowed. No general internet access unless you've said yes to a specific domain.
Memory isolation. No shared context between users or tenants. One customer's email does not leak into another customer's session.
Rate limits per tool. Unbounded loops are how agents burn money and damage systems. Every tool call has a ceiling.
Schema-validated tool outputs. The model returns structured data that matches a schema, not free-form text that then gets hopefully-interpreted downstream.
Sandboxed code execution. Any code the model writes runs in an isolated environment that cannot reach production.

None of these are exotic. They are the same principles that apply to any untrusted code running in your infrastructure. The only twist is that the "untrusted code" is now an LLM producing English instructions, and the people writing the instructions include whoever emails your customer-support agent.

A Concrete Example

Picture a typical invoice-processing pipeline for a Swiss SME with about 25 employees. Before: one person, three hours a day, roughly 50 PDF invoices per day, a 5% error rate. After, with a properly built vault-first architecture: under two seconds per invoice, under 0.1% error rate, one person handling an exception queue of about six invoices a week.

The part that often gets skipped in these conversations: the agent does not touch the ERP directly. Every invoice it extracts is schema-validated, every payment decision above CHF 1,000 goes to a human approver, every call is logged, and the whole thing has a kill-switch the finance lead can hit from her phone.

You build setups like this for reliability, not because of Mythos. And that's exactly why, after Mythos, vault architectures don't need to be rewritten — they're already built for this kind of threat.

The Honest Limits

Guardrails do not stop prompt injection. They contain the blast radius when it happens.

No vault stops a model from being wrong. It stops a wrong model from being destructive.

Human-in-the-loop slows throughput. That is the point, and it is worth it.

This architecture does not protect you against a compromised model provider. That's a separate threat model, and the mitigation is vendor selection and contractual assurance, not agent design.

If you read that list and thought "this is a lot of work", yes. It is also the reason our deployments ship in weeks instead of quarters. The guardrails are the boring, reliable part. The clever part is picking the right workflow to automate.

What to Do This Week

This is the part you can walk into Monday morning with.

If You Already Have an AI Agent in Production

Inventory every tool the agent can call. Literally make a list.
For each tool, write one sentence: "worst case if the model misuses this". If you can't, the tool is out of scope until you can.
Identify every destructive action and route it through a human approval step. Even a Slack message with Approve/Reject buttons counts as a start.
Turn on full prompt and tool-call logging if it isn't already on. You cannot investigate what you didn't record.
Set up a kill-switch. A single config change that disables the agent. Test it.

Consider a scenario that plays out realistically in SMEs today: a 30-person logistics firm deployed a "customer-support AI" three months earlier. The agent has read access to the entire customer database and write permissions to modify orders. With roughly three weeks of vault work — no rebuild — the blast radius could be brought down from "entire customer database plus order-modification rights" to "read-only lookup with human approval for any state change."

Same workflow, same user experience, about 98% of the speed benefit, maybe 2% of the risk. That's the before-and-after pattern an existing deployment can realistically reach.

If You're Evaluating an AI Vendor Right Now

Ask them to describe their vault architecture, the exact controls, not "our model is safe." Ask what happens when the model tries to do something outside policy. Ask for an audit-trail export. Ask about their responsible-disclosure policy for model-discovered bugs.

If any of these answers is "the model handles it" or "that's handled at the model layer", walk away. That's the answer of a vendor who either doesn't understand the problem or doesn't want to pay the engineering cost of solving it. Both are disqualifying. Our consulting & projects team does this kind of vendor review regularly, and this is the single most common failure mode.

If You Haven't Started Yet

Good. You get to build on the right architecture from day one instead of refactoring under pressure later.

The fastest way to pick the right starter workflow and the right vault around it is our free 30-minute AI Potenzial-Check. Walk in with a workflow you'd like to automate, walk out with a one-page architecture and a yes/no on whether it belongs behind a vault from day one.

→ Book your AI Potenzial-Check (30 minutes, no obligation, no sales script).

After Mythos, Vaults Are the Only Default

Claude Mythos Preview just found thousands of zero-days in code that had survived decades of human review. Open-weight parity is roughly six months out. In the meantime, the same capability trend is reshaping the threat model for every AI agent running inside a normal business, not as an attacker but as a surface to attack through.

None of that requires panic. It requires a decision about defaults.

The default of the last two years was: deploy the agent, give it the tools it needs, trust the model, audit later. That default was always on borrowed time. Now the time is up.

The new default, the one we've been shipping at TecMinds since before Glasswing, is a vault first and capability second. Least privilege, deterministic policy layer, human in the loop for anything destructive, full audit trail, kill-switch from day one.

The work is not glamorous. It is the reason our deployments go live in weeks and stay live. And it's the reason, after Mythos, we are not rewriting our architecture this week. You shouldn't have to either.

If you'd like to talk through what AI agent security looks like for your specific workflow, the AI Potenzial-Check is 30 free minutes that turn into a one-page plan. Bring your current setup, your ambition for the next one, and any honest question you haven't had time to answer yet. We'll bring the vault.

Last updated: 2026-04-09. AI capabilities are moving fast; this article will be refreshed as Anthropic publishes further details on Project Glasswing and as open-weight models catch up on bug-finding.

AI Agent Security for SMEs: What Mythos Means for You

Related posts

From Idea to Product: How We Built Acurio and citecheck

Context Engineering: Why Prompt Engineering Is No Longer Enough for Enterprise AI

The Great American AI Act: What US Federal AI Regulation Means for Every Business