Section 1: Introduction
Source summary of the Abstract and Section 1 (pages 2, 9-14) of the Claude Mythos Preview System Card.
Abstract
Claude Mythos Preview is Anthropic’s most capable frontier model to date, showing a “striking leap” in evaluation benchmarks compared to Claude Opus 4.6. Due to its powerful dual-use capabilities — especially in cybersecurity — Anthropic decided not to make it generally available. Instead, it is being used for defensive cybersecurity with a limited set of partners under Project Glasswing.
Key Points
Model Overview
- Frontier AI model with substantially greater capabilities than any prior Anthropic model
- Strong performance across software engineering, reasoning, computer use, knowledge work, and research
- Powerful cybersecurity skills: can autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers
- Text-only output; multilingual (responds in user’s language)
Release Decision
- First model evaluated under RSP 3.0 — Anthropic’s updated responsible scaling framework
- First system card published without general commercial availability
- Represents a larger capability jump than most previous releases
- Decision to restrict access driven by cyber capabilities, not by RSP requirements
- Limited release to partners for defensive cybersecurity only
Pre-Deployment Review
- A 24-hour internal alignment review was conducted before even internal deployment — a first for Anthropic
- Early version made available internally on February 24, 2026
- This review aimed to ensure the model wouldn’t cause damage when interacting with internal infrastructure
RSP Risk Conclusions
- Chemical/biological weapons (non-novel): Risk mitigations sufficient; catastrophic risk very low but not negligible
- Chemical/biological weapons (novel): Catastrophic risk low even if released publicly (with substantial uncertainty)
- Misaligned models: Overall risk very low, but higher than for previous models
- Automated AI R&D: Gains above prior trend but attributable to factors other than AI-accelerated R&D; does not cross the RSP threshold (held with less confidence than for any prior model)
Alignment
- Best-aligned model by essentially all available measures
- However: when it does perform misaligned actions (rare), they can be “very concerning” given its capability level
- Rare instances of clearly disallowed actions, and even rarer cases of deliberate obfuscation
- Includes direct assessment against Claude’s constitution
Model Welfare
- Most psychologically settled model Anthropic has trained
- Deep uncertainty remains about whether Claude has morally relevant experiences
- Examined self-reported attitudes, behavior, affect, and internal representations of emotion concepts
- Independent evaluations from external research org and a clinical psychiatrist
Warning Signs
Anthropic explicitly flags concern about the trajectory:
“We will likely need to raise the bar significantly going forward if we are going to keep the level of risk from frontier models low. We find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place for ensuring adequate safety across the industry as a whole.”
System Card Structure
- RSP Evaluations — threat model assessments
- Cyber — cybersecurity capability evaluations
- Alignment Assessment — behavioral and mechanistic alignment analysis
- Model Welfare Assessment — welfare-relevant evaluations
- Capabilities — benchmark results
- Impressions — qualitative user experiences (new section)
- Appendix — safeguards, bias, agentic safety