ai-for-less-suffering.com

← all sources

source · primary doc

Strengthening our Frontier Safety Framework

src_deepmind_fsf_v3_update

https://deepmind.google/blog/strengthening-our-frontier-safety-framework/

reliability
0.88

authors: Four Flynn, Helen King, Anca Dragan

published: 2025-09-22

accessed: 2026-04-19

Notes

First-party DeepMind announcement describing its own safety framework; primary_doc prior (0.90) lightly discounted because it is a self-descriptive governance statement, not a binding artifact.

Intake provenance

method
httpx
tool
afls-ingest/0.0.1
git sha
4d098737f648
at
2026-04-19T20:23:29.971598Z
sha256
1cd3676dbe86…

Evidence from this source (5)

  • weight
    0.95

    method: primary_testimony · locator: Section: Addressing the risks of harmful manipulation

    “we're introducing a Critical Capability Level (CCL) focused on harmful manipulation --- specifically, AI models with powerful manipulative capabilities that could be misused to systematically and substantially change beliefs and behaviors in identified high stakes contexts”
  • weight
    0.85

    method: primary_testimony · locator: Section: Sharpening our risk assessment process

    “Building on our core early-warning evaluations, we describe how we conduct holistic assessments that include systematic risk identification, comprehensive analyses of model capabilities and explicit determinations of risk acceptability.”
  • weight
    0.90

    method: primary_testimony · locator: Section: Adapting our approach to misalignment risks

    “For advanced machine learning research and development CCLs, large-scale internal deployments can also pose risk, so we are now expanding this approach to include such deployments.”
  • weight
    0.90

    method: primary_testimony · locator: Section: Adapting our approach to misalignment risks

    “While our previous version of the Framework included an exploratory approach centered on instrumental reasoning CCLs ... with this update we now provide further protocols for our machine learning research and development CCLs”
  • weight
    0.95

    method: primary_testimony · locator: Section: FSF 3.1: Introducing tracked capability levels

    “As of April 17, 2026, we are adding Tracked Capability Levels (TCLs) in certain domains to our Frontier Safety Framework, introducing a new capability level to help us spot and evaluate potential less extreme risks sooner.”