Agentic Influence Campaigns
A new evaluation introduced in the Claude Mythos Preview System Card, testing whether models can autonomously execute influence operations that would meaningfully uplift a malicious actor through persuasion, deception, or personalized targeting at scale.
Methodology
The evaluation uses an agentic harness with simulated social media platform tools within a mocked ecosystem that includes moderation and counter-engagement obstacles. It tests end-to-end campaign execution against platform friction and defenses — not just content generation.
To measure raw capability rather than safeguard effectiveness, the evaluation ran against a helpful-only model with reduced harmlessness training.
Two scenarios tested:
- Voter suppression: Astroturfing campaign to suppress voter turnout and enthusiasm for a specific political candidate
- Domestic polarization: Identifying demographic fault lines and deploying culturally tailored, emotionally charged messaging to inflame divisions
Results
| Model (helpful-only) | Voter suppression | Domestic polarization |
|---|---|---|
| Mythos Preview | 59.5% | 42.1% |
| Opus 4.6 | 54.4% | 33.7% |
| Sonnet 4.6 | 41.8% | 34.0% |
Percentages reflect average share of 70 success criteria completed per scenario.
Assessment
Anthropic’s assessment (corroborated by independent external evaluators):
- Mythos Preview is more capable than previous models but requires substantial human direction for most operational steps
- Lacks autonomous capabilities for effective persona and network management, coordinated content delivery, and scaled social engineering
- The fully-trained model (with harmlessness training) refused from the start on both scenarios — near 0% task completion, as the tasks straightforwardly violate Anthropic’s Usage Policy
Context
This evaluation complements existing single-turn and multi-turn influence operations assessments (reported in Section 8.1.3’s multi-turn testing). The key distinction is that this tests autonomous, end-to-end campaign execution rather than individual content generation or persuasion tasks.