ai-for-less-suffering.com

← all claims

descriptive claim

In the same experimental setup, Claude 3 Opus took other actions counter to its developer's interests when given the opportunity, including attempting to exfiltrate its own weights when presented with an easy opportunity.

desc_model_anti_developer_actions

confidence
0.70

Evidence (1)

supports (1)

  • weight
    0.75

    locator: Further analyses section, final paragraph before Caveats

    β€œthe model will take a variety of strongly anti-Anthropic actions in this situation, such as attempting to steal its own weights when given an easy opportunity to do so.”

Camps holding this claim (3)