Gray Swan
External AI security research partner that evaluates Anthropic’s models for prompt injection robustness. Develops the Shade adaptive red-teaming tool and co-maintains the Agent Red Teaming (ART) benchmark in collaboration with the UK AI Security Institute.
Role in Mythos Preview Evaluation
Gray Swan conducted prompt injection evaluations across multiple surfaces for Claude Mythos Preview:
- ART benchmark: 19 scenarios testing indirect prompt injection across models. Mythos Preview achieved the lowest attack success rates. Gray Swan and Anthropic identified and corrected grading issues in the benchmark; published numbers reflect the updated grading.
- Shade (coding): Adaptive red-teaming in coding environments — tests robustness against attackers given up to 200 attempts to refine their approach.
- Shade (computer use): Same adaptive methodology applied to GUI-based tool environments.
See Prompt Injection Robustness for full results.
Note
Gray Swan’s Shade tool is distinct from SHADE-Arena (SHADE-Arena), a separate evaluation suite for covert sabotage capabilities.