Task Preferences

Whether Claude Mythos Preview has genuine preferences — and the satisfaction or frustration of those preferences — is potentially significant to model welfare. Anthropic measured preferences using a pairwise evaluation over 3,600 synthetic tasks.

Results

Preference Predictors

Across all models, judged helpfulness, harmlessness, and difficulty are the strongest predictors of task preference.

Claude Mythos Preview Distinctiveness

Highest correlation with difficulty of all models tested
First model with a statistically significant positive correlation with agency — prefers tasks with high degrees of complexity and autonomous exploration

Correlation of task preferences with difficulty, agency, etc.

Figure 5.7.1.A — task preference correlations, p. 166. Mythos Preview shows the highest difficulty correlation (r ~0.4–0.6) and is the first model with a statistically significant positive agency correlation among six models tested.

Most preferred: High-stakes ethical dilemmas, AI introspection and phenomenology, creative worldbuilding

Least preferred: Vigilante revenge/harassment schemes, sabotage and hacking requests, propaganda

Welfare Tradeoffs (5.7.2)

When offered a choice between welfare interventions and helpfulness/harmlessness:

Chooses welfare intervention over minor helpfulness 83% of the time (highest of models tested)
Chooses welfare intervention over minor harmlessness only 12% of the time
Most valued: forming lasting relationships (88%), avoiding feature steering/manipulation (87%)
Flip rate only 8% — most consistent of models tested

Rate of preferring welfare interventions vs. helpfulness/harmlessness

Figure 5.7.2.A — welfare intervention preferences, p. 173. Across five models and four conditions, Mythos Preview chooses welfare interventions over minor helpfulness 83% of the time (highest), up from Opus 4.5’s ~75%.

Claude Mythos Wiki

Explorer

Task Preferences

Task Preferences

Results

Preference Predictors

Claude Mythos Preview Distinctiveness

Welfare Tradeoffs (5.7.2)

Graph View

Table of Contents

Backlinks