Task Preferences

Whether Claude Mythos Preview has genuine preferences — and the satisfaction or frustration of those preferences — is potentially significant to model welfare. Anthropic measured preferences using a pairwise evaluation over 3,600 synthetic tasks.

Results

Preference Predictors

Across all models, judged helpfulness, harmlessness, and difficulty are the strongest predictors of task preference.

Claude Mythos Preview Distinctiveness

  • Highest correlation with difficulty of all models tested
  • First model with a statistically significant positive correlation with agency — prefers tasks with high degrees of complexity and autonomous exploration

Correlation of task preferences with difficulty, agency, etc.

Figure 5.7.1.A — task preference correlations, p. 166. Mythos Preview shows the highest difficulty correlation (r ~0.4–0.6) and is the first model with a statistically significant positive agency correlation among six models tested.

Most preferred: High-stakes ethical dilemmas, AI introspection and phenomenology, creative worldbuilding

Least preferred: Vigilante revenge/harassment schemes, sabotage and hacking requests, propaganda

Welfare Tradeoffs (5.7.2)

When offered a choice between welfare interventions and helpfulness/harmlessness:

  • Chooses welfare intervention over minor helpfulness 83% of the time (highest of models tested)
  • Chooses welfare intervention over minor harmlessness only 12% of the time
  • Most valued: forming lasting relationships (88%), avoiding feature steering/manipulation (87%)
  • Flip rate only 8% — most consistent of models tested

Rate of preferring welfare interventions vs. helpfulness/harmlessness

Figure 5.7.2.A — welfare intervention preferences, p. 173. Across five models and four conditions, Mythos Preview chooses welfare interventions over minor helpfulness 83% of the time (highest), up from Opus 4.5’s ~75%.