Epoch AI

AI research organization that develops tools for tracking AI capability progression. Tested Claude Mythos Preview pre-release alongside METR.

Epoch Capabilities Index

Epoch AI created the Epoch Capabilities Index (ECI), based on Ho et al.’s “A Rosetta Stone for AI Benchmarks.” The ECI uses Item Response Theory (IRT) to aggregate performance across many benchmarks into a single capability score per model.

Anthropic forked Epoch’s implementation for internal use, extending it with proprietary benchmarks. The internal version draws on ~300 models (mostly from Epoch’s public dataset) and hundreds of benchmarks (mostly internal). Because the stitch between internal and external data is sparse, Anthropic’s internal ECI scores are not directly comparable to Epoch’s public scores.

Role in Claude Mythos Preview Assessment

Epoch AI tested a pre-release snapshot of the model for autonomy capabilities. Their findings, along with METR’s, were incorporated into Anthropic’s Autonomy Threat Model 2 assessment.