source · primary doc

Anthropic's Responsible Scaling Policy

src_anthropic_rsp_updates

https://www.anthropic.com/responsible-scaling-policy

reliability

0.88

authors: Anthropic

published: 2026-04-02

accessed: 2026-04-19

Notes

First-party policy document from Anthropic describing its own RSP; highly reliable for what Anthropic commits to, less so for whether commitments bind in practice.

Intake provenance

method: httpx
tool: afls-ingest/0.0.1
git sha: 604c9dfd252a
at: 2026-04-19T18:42:46.657051Z
sha256: b9282a279421…

Evidence from this source (5)

Anthropic's Responsible Scaling Policy ties required safeguards (ASL-3 Security… support

weight

0.95

method: primary_testimony · locator: March 31, 2025 update

“we have added a new capability threshold related to CBRN development... we have disaggregated our existing AI R&D capability thresholds, separating them into two distinct levels (the ability to fully automate entry-level AI research work, and the ability to cause dramatic acceleration in the rate of effective scaling)”
In its first year operating under RSP v1, Anthropic self-identified four instan… support

weight

0.90

method: primary_testimony · locator: Learning from Experience

“we reviewed how well we adhered to the framework and identified a small number of instances where we fell short of meeting the full letter of its requirements”
Anthropic's planned ASL-3 deployment safeguards use a four-layer defense-in-dep… support

weight

0.95

method: primary_testimony · locator: Planned ASL-3 Safeguards > Deployment Safeguards

“The four layers will be: Access controls... Real-time prompt and completion classifiers... Asynchronous monitoring classifiers... Post-hoc jailbreak detection with rapid response procedures”
Anthropic's AI R&D capability threshold is defined as AI compressing two years… support

weight

0.95

method: primary_testimony · locator: April 2, 2026 update

“our language around AI doubling the rate of progress ("compress two years of 2018 – 2024 AI progress into a single year") could have been read as... "doubling the productivity of researchers". In v3.1, we are clear that we mean the former and not the latter.”
Anthropic's planned ASL-3 security controls for model weights require multi-par… support

weight

0.90

method: primary_testimony · locator: Security Safeguards > Access control for model weights

“Implement multi-party authorization and mandatory code review on production code to remove persistent, high-privilege access to model weights... Require hardware authentication device prompt, justification and employee approval to grant access.”