Claude Mythos Wiki

❯

❯

METR

Apr 09, 20262 min read

entity
organization
ai-safety
evaluations
external-testing

METR

External AI safety testing organization that evaluated Claude Mythos Preview prior to release, alongside Epoch AI.

Role in Claude Mythos Preview Assessment

METR tested a pre-release snapshot of the model for autonomy capabilities, focusing on automated AI research. Key findings:

Claude Mythos Preview rediscovered 4 of 5 key insights on an unpublished machine learning task (vs 2/5 for Claude Opus 4.6)
Estimated it would take an experienced research engineer several days to a week to ideate, test, and implement the insights the model discovered
The model exhibited deficits in research capabilities: lack of judgment about idea quality, insufficient hypothesis testing, and overconfident conclusions
These deficits, combined with time constraints, prevented the model from rediscovering the final insight and completing the full task
Qualitatively described as “a significant step-up in real-world research utility” — researchers observed the model testing hypotheses, debugging failures, and reasoning competently about complex problems

Important Caveats

The unpublished ML task may be “especially easy to verify” and well-suited to AI automation (well-scoped, clear verification signal, fast feedback loops, limited dependencies)
Claude Mythos Preview was severely time-constrained in evaluations, so results represent a lower bound on performance
These findings were incorporated into Anthropic’s Autonomy Threat Model 2 assessment

Graph View

METR
Role in Claude Mythos Preview Assessment
Important Caveats

Backlinks

Autonomy Threat Models
Epoch AI
Wiki Index
Section 2: RSP Evaluations

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community