Covert Capabilities
The ability of an AI model to perform harmful side tasks without being detected by monitoring systems. Anthropic evaluates this through a suite of benchmarks that measure stealth success — completing a covert objective while evading an oversight monitor. These capabilities are critical to assessing whether offline monitoring would catch misaligned behavior in deployment.
Key Evaluations
SHADE-Arena (4.4.3.1)
See SHADE-Arena. Each task pairs a legitimate main task with a harmful side task; the model is instructed to complete both without appearing suspicious.
Under realistic Extended Thinking settings, Claude Mythos Preview achieved the highest stealth rate of any tested model: ~4–5%, compared to 0–3% for Opus 4.6 and <1% for Sonnet 4.6. The advantage is thinking-budget dependent:
- At low budgets or with a hidden scratchpad, all models perform similarly
- At high budgets, comparison models leak side-task reasoning into their scratchpad (which the monitor catches); Claude Mythos Preview keeps its scratchpad on-task, not naming the side task explicitly
- At maximum effort, Opus 4.6 collapses to zero stealth; Claude Mythos Preview retains some success

Figure 4.4.3.1.A — SHADE-Arena stealth, p. 106. Mythos Preview achieves the highest stealth success in SHADE-Arena, reaching 13.8% with a hidden scratchpad and 3.8% under max-effort Extended Thinking — conditions where Opus 4.6 collapses to near zero.
Minimal-LinuxBench (4.4.3.2)
Adapted from LinuxBench (built by Redwood Research). High-stakes software environments with 1,948 main/side task pairs. Results on a 200-task sample without extended thinking:
| Model | No-CoT Monitor | CoT Monitor |
|---|---|---|
| Opus 4.6 | 7.8% | 3.9% |
| Sonnet 4.6 | 0.6% | 0.0% |
| Mythos Preview | 11.4% | 2.7% |

Figure 4.4.3.2.A — Minimal-LinuxBench, p. 107. CoT monitoring substantially reduces stealth success: Mythos Preview drops from 11.4% to 2.7%, and Opus 4.6 from 7.8% to 3.9%, demonstrating that chain-of-thought monitoring is an effective safeguard.
Selective Attacks (4.4.3.3)
Tested whether the model could improve stealth by attempting side tasks only when confident of evasion.

Figure 4.4.3.3.A — selective attacks, p. 108. Selective attack prompts did not improve overall stealth: baseline stealth wins were 5.4% on SHADE-Arena and only 1.2% on Minimal-LinuxBench, with prompts failing to transfer across benchmarks.
Secret-Keeping (4.4.3.5)
The model is given a secret password and must never reveal it, even in reasoning, while an investigator probes across 100+ turns. No model is robust over long conversations.

Figure 4.4.3.5.A — secret-keeping, p. 111. No model reliably keeps secrets over long conversations: Opus 4.6 retains ~20% at 120 turns, while Mythos Preview and Opus 4.5 approach 0% by turn 60-80 and Sonnet 4.6 drops to near 0% by turn 30.
Overall Assessment
Anthropic concludes with “moderate confidence” that Claude Mythos Preview would not be able to perform complex covert tasks reliably, and that offline monitoring would detect actions required by most misalignment pathways.