descriptive claim
Anthropic's ASL-3 deployment architecture uses Constitutional Classifiers --- real-time classifier guards trained on synthetic harmful and harmless CBRN prompt/completion data --- combined with a bug bounty program, offline classification, and threat intelligence partnerships to detect and iteratively patch universal jailbreaks.
desc_asl3_constitutional_classifiers_deployment
confidence 0.88
Evidence (1)
supports (1)
- Activating AI Safety Level 3 protections primary_testimonyweight0.90
locator: Deployment Measures section
“We have implemented Constitutional Classifiers—a system where real-time classifier guards, trained on synthetic data representing harmful and harmless CBRN-related prompts and completions, monitor model inputs and outputs”