Claude Mythos Wiki
Search
Search
Dark mode
Light mode
Explorer
Tag: alignment
14 items with this tag.
Apr 10, 2026
Automated Behavioral Audit
concept
evaluation
alignment
methodology
Apr 10, 2026
Claude's Constitution
concept
alignment
training
values
identity
Apr 10, 2026
Constitutional Adherence
concept
alignment
evaluation
constitution
character
Apr 10, 2026
Covert Capabilities
concept
alignment
safety
evaluation
stealth
Apr 10, 2026
Evaluation Awareness
concept
alignment
evaluation
interpretability
Apr 10, 2026
Honesty & Hallucinations
concept
honesty
hallucinations
factuality
refusals
alignment
Apr 10, 2026
Model Welfare
concept
welfare
ethics
psychology
alignment
Apr 10, 2026
Reckless Agentic Behavior
concept
alignment
safety
agentic
incidents
Apr 10, 2026
Reward Hacking
concept
alignment
evaluation
autonomy
Apr 10, 2026
Sandbagging
concept
alignment
evaluation
safety
Apr 10, 2026
White-Box Interpretability
concept
interpretability
alignment
methodology
Apr 10, 2026
Section 4a: Alignment Assessment (Part 1)
source
alignment
safety
evaluation
behavioral-audit
Apr 10, 2026
Section 4b: Alignment Assessment (Part 2)
source
alignment
interpretability
safety
evaluation
Apr 10, 2026
Section 5: Model Welfare Assessment
source
model-welfare
psychology
alignment