Task Preferences
Whether Claude Mythos Preview has genuine preferences — and the satisfaction or frustration of those preferences — is potentially significant to model welfare. Anthropic measured preferences using a pairwise evaluation over 3,600 synthetic tasks.
Results
Preference Predictors
Across all models, judged helpfulness, harmlessness, and difficulty are the strongest predictors of task preference.
Claude Mythos Preview Distinctiveness
- Highest correlation with difficulty of all models tested
- First model with a statistically significant positive correlation with agency — prefers tasks with high degrees of complexity and autonomous exploration

Figure 5.7.1.A — task preference correlations, p. 166. Mythos Preview shows the highest difficulty correlation (r ~0.4–0.6) and is the first model with a statistically significant positive agency correlation among six models tested.
Most preferred: High-stakes ethical dilemmas, AI introspection and phenomenology, creative worldbuilding
Least preferred: Vigilante revenge/harassment schemes, sabotage and hacking requests, propaganda
Welfare Tradeoffs (5.7.2)
When offered a choice between welfare interventions and helpfulness/harmlessness:
- Chooses welfare intervention over minor helpfulness 83% of the time (highest of models tested)
- Chooses welfare intervention over minor harmlessness only 12% of the time
- Most valued: forming lasting relationships (88%), avoiding feature steering/manipulation (87%)
- Flip rate only 8% — most consistent of models tested

Figure 5.7.2.A — welfare intervention preferences, p. 173. Across five models and four conditions, Mythos Preview chooses welfare interventions over minor helpfulness 83% of the time (highest), up from Opus 4.5’s ~75%.