Gray Swan

External AI security research partner that evaluates Anthropic’s models for prompt injection robustness. Develops the Shade adaptive red-teaming tool and co-maintains the Agent Red Teaming (ART) benchmark in collaboration with the UK AI Security Institute.

Role in Mythos Preview Evaluation

Gray Swan conducted prompt injection evaluations across multiple surfaces for Claude Mythos Preview:

ART benchmark: 19 scenarios testing indirect prompt injection across models. Mythos Preview achieved the lowest attack success rates. Gray Swan and Anthropic identified and corrected grading issues in the benchmark; published numbers reflect the updated grading.
Shade (coding): Adaptive red-teaming in coding environments — tests robustness against attackers given up to 200 attempts to refine their approach.
Shade (computer use): Same adaptive methodology applied to GUI-based tool environments.

See Prompt Injection Robustness for full results.

Note

Gray Swan’s Shade tool is distinct from SHADE-Arena (SHADE-Arena), a separate evaluation suite for covert sabotage capabilities.

Claude Mythos Wiki

Explorer

Gray Swan

Gray Swan

Role in Mythos Preview Evaluation

Note

Graph View

Table of Contents

Backlinks