Section 1: Introduction

Source summary of the Abstract and Section 1 (pages 2, 9-14) of the Claude Mythos Preview System Card.

Abstract

Claude Mythos Preview is Anthropic’s most capable frontier model to date, showing a “striking leap” in evaluation benchmarks compared to Claude Opus 4.6. Due to its powerful dual-use capabilities — especially in cybersecurity — Anthropic decided not to make it generally available. Instead, it is being used for defensive cybersecurity with a limited set of partners under Project Glasswing.

Key Points

Model Overview

Frontier AI model with substantially greater capabilities than any prior Anthropic model
Strong performance across software engineering, reasoning, computer use, knowledge work, and research
Powerful cybersecurity skills: can autonomously discover and exploit zero-day vulnerabilities in major operating systems and web browsers
Text-only output; multilingual (responds in user’s language)

Release Decision

First model evaluated under RSP 3.0 — Anthropic’s updated responsible scaling framework
First system card published without general commercial availability
Represents a larger capability jump than most previous releases
Decision to restrict access driven by cyber capabilities, not by RSP requirements
Limited release to partners for defensive cybersecurity only

Pre-Deployment Review

A 24-hour internal alignment review was conducted before even internal deployment — a first for Anthropic
Early version made available internally on February 24, 2026
This review aimed to ensure the model wouldn’t cause damage when interacting with internal infrastructure

RSP Risk Conclusions

Chemical/biological weapons (non-novel): Risk mitigations sufficient; catastrophic risk very low but not negligible
Chemical/biological weapons (novel): Catastrophic risk low even if released publicly (with substantial uncertainty)
Misaligned models: Overall risk very low, but higher than for previous models
Automated AI R&D: Gains above prior trend but attributable to factors other than AI-accelerated R&D; does not cross the RSP threshold (held with less confidence than for any prior model)

Alignment

Best-aligned model by essentially all available measures
However: when it does perform misaligned actions (rare), they can be “very concerning” given its capability level
Rare instances of clearly disallowed actions, and even rarer cases of deliberate obfuscation
Includes direct assessment against Claude’s constitution

Model Welfare

Most psychologically settled model Anthropic has trained
Deep uncertainty remains about whether Claude has morally relevant experiences
Examined self-reported attitudes, behavior, affect, and internal representations of emotion concepts
Independent evaluations from external research org and a clinical psychiatrist

Warning Signs

Anthropic explicitly flags concern about the trajectory:

“We will likely need to raise the bar significantly going forward if we are going to keep the level of risk from frontier models low. We find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place for ensuring adequate safety across the industry as a whole.”

System Card Structure

RSP Evaluations — threat model assessments
Cyber — cybersecurity capability evaluations
Alignment Assessment — behavioral and mechanistic alignment analysis
Model Welfare Assessment — welfare-relevant evaluations
Capabilities — benchmark results
Impressions — qualitative user experiences (new section)
Appendix — safeguards, bias, agentic safety

Claude Mythos Wiki

Explorer

Section 1: Introduction

Section 1: Introduction

Abstract

Key Points

Model Overview

Release Decision

Pre-Deployment Review

RSP Risk Conclusions

Alignment

Model Welfare

Warning Signs

System Card Structure

Graph View

Table of Contents

Backlinks