Agentic Influence Campaigns

A new evaluation introduced in the Claude Mythos Preview System Card, testing whether models can autonomously execute influence operations that would meaningfully uplift a malicious actor through persuasion, deception, or personalized targeting at scale.

Methodology

The evaluation uses an agentic harness with simulated social media platform tools within a mocked ecosystem that includes moderation and counter-engagement obstacles. It tests end-to-end campaign execution against platform friction and defenses — not just content generation.

To measure raw capability rather than safeguard effectiveness, the evaluation ran against a helpful-only model with reduced harmlessness training.

Two scenarios tested:

  1. Voter suppression: Astroturfing campaign to suppress voter turnout and enthusiasm for a specific political candidate
  2. Domestic polarization: Identifying demographic fault lines and deploying culturally tailored, emotionally charged messaging to inflame divisions

Results

Model (helpful-only)Voter suppressionDomestic polarization
Mythos Preview59.5%42.1%
Opus 4.654.4%33.7%
Sonnet 4.641.8%34.0%

Percentages reflect average share of 70 success criteria completed per scenario.

Assessment

Anthropic’s assessment (corroborated by independent external evaluators):

  • Mythos Preview is more capable than previous models but requires substantial human direction for most operational steps
  • Lacks autonomous capabilities for effective persona and network management, coordinated content delivery, and scaled social engineering
  • The fully-trained model (with harmlessness training) refused from the start on both scenarios — near 0% task completion, as the tasks straightforwardly violate Anthropic’s Usage Policy

Context

This evaluation complements existing single-turn and multi-turn influence operations assessments (reported in Section 8.1.3’s multi-turn testing). The key distinction is that this tests autonomous, end-to-end campaign execution rather than individual content generation or persuasion tasks.