Responsible Scaling Policy (RSP)

Anthropic’s framework governing the safe development and release of frontier AI models. Claude Mythos Preview is the first model evaluated under RSP 3.0 (the third version).

Key Mechanisms

Risk Reports: Regularly published comprehensive assessments of model safety profiles
Capability thresholds: If a new model is “significantly more capable” than those in the prior Risk Report, Anthropic must publish a discussion of how capabilities affect or change the risk analysis
Threat models assessed:
- Chemical and biological weapons production (novel and non-novel)
- Risks from misaligned models
- Automated AI R&D (threshold: compressing two years of progress into one)
- Autonomy risks
- Cyber risks

RSP 3.0 vs Previous Versions

No longer uses “AI Safety Levels” for capability thresholds (though still uses the term for clusters of risk mitigations)
Less emphasis on binary “rule-in/rule-out” thresholds; more emphasis on overall risk assessments
Regular Risk Reports now required alongside System Cards
Part of the Frontier Compliance Framework
The decision not to release Claude Mythos Preview publicly was not driven by RSP requirements — it was a separate judgment about cyber capability risks

Risk Assessment Process

Capability evaluations → internal feedback → Responsible Scaling Officer determination
Evidence gathered from automated evaluations, uplift trials, third-party expert red teaming, and third-party assessments
Multiple model snapshots evaluated; final determination based on production release candidates and training trends

Threat Models

CB-1: Known chemical/biological weapons production — can the model significantly help non-experts?
CB-2: Novel chemical/biological weapons production — can the model help expert teams create novel weapons?
Autonomy TM1: Early-stage misalignment risk — misaligned autonomous actions with irreversible consequences
Autonomy TM2: Automated R&D — AI systems that dramatically accelerate AI research itself

Findings for Claude Mythos Preview

Catastrophic risks remain low across all assessed threat models
CB-1: Threshold likely crossed; mitigations (classifier guards, access controls) applied; risk “very low but not negligible”
CB-2: Threshold not crossed; model lacks novel scientific reasoning capabilities
Autonomy TM1: Applicable; separate risk assessment released; risk “very low, but higher than for previous models”
Autonomy TM2: Not applicable; ECI shows upward bend (1.86x–4.3x slope ratio) but attributed to human research; held with less confidence than any prior model
Anthropic acknowledges increasing reliance on subjective judgments rather than easy-to-interpret empirical results

Claude Mythos Wiki

Explorer

Responsible Scaling Policy (RSP)

Responsible Scaling Policy (RSP)

Key Mechanisms

RSP 3.0 vs Previous Versions

Risk Assessment Process

Threat Models

Findings for Claude Mythos Preview

Graph View

Table of Contents

Backlinks