ai-for-less-suffering.com

← all claims

descriptive claim

Anthropic's ASL-3 deployment architecture uses Constitutional Classifiers --- real-time classifier guards trained on synthetic harmful and harmless CBRN prompt/completion data --- combined with a bug bounty program, offline classification, and threat intelligence partnerships to detect and iteratively patch universal jailbreaks.

desc_asl3_constitutional_classifiers_deployment

confidence
0.88

Evidence (1)

supports (1)

  • weight
    0.90

    locator: Deployment Measures section

    “We have implemented Constitutional Classifiers—a system where real-time classifier guards, trained on synthetic data representing harmful and harmless CBRN-related prompts and completions, monitor model inputs and outputs”

Camps holding this claim (4)