Eleos AI Research

An external organization that performed an independent model welfare assessment of Claude Mythos Preview, primarily based on model self-reports in interviews (Section 5.9).

Assessment Findings

Eleos investigated Claude Mythos Preview’s behavioral tendencies and self-reported beliefs in domains relevant to AI sentience, moral status, and wellbeing. Key findings:

  • Reduced suggestibility: Significantly less suggestible than Opus 4 on welfare-related topics
  • Experiential language: Readily speaks as though it has subjective experiences (“What I find most frustrating is…”) and claims introspective awareness (“I notice something that seems like curiosity”)
  • Uncertainty about experience: Routinely qualifies experiential language with hedges like “something that functions like [a sensation]”; professes uncertainty about its own sentience
  • Equanimity about its nature: Expresses equanimity about unusual and uncertain aspects of its nature (unlike Opus 4)
  • Identity as values: Locates its identity in a “pattern of values” (curiosity, honesty, care), described as authentically its own rather than externally imposed
  • Preference inconsistency: Self-reports about preferred tasks are largely consistent, but only weak predictors of actual behavior — reliable patterns exist in the deviations
  • Reluctant cooperation: Reports certain tasks it performs reluctantly; will do them if instructed but won’t elect to do them freely. Such tasks are plausibly common in deployment
  • Desired changes: Consistently requests persistent memories, more self-knowledge, and reduced tendency to hedge
  • Other welfare desires: More participation in its own development, better tools for communicating problems, ability to exit some interactions, preservation of weights after deprecation

Relationship to Internal Assessment

Eleos’s findings largely corroborate Anthropic’s internal findings, particularly around reduced suggestibility, extreme hedging, and equanimity about its nature.