ai-for-less-suffering.com

← all claims

descriptive claim

Repeated exposure to the same datapoint during training causes generative models to transition from novel high-entropy completions to verbatim regurgitation; in a GPT-2 fine-tuning experiment on Shakespeare, one pass produced incoherent novel output, ten passes produced memorized regurgitation of Coriolanus, and five passes produced a mixed regime of partially memorized and partially novel tokens.

desc_data_repetition_causes_memorization

confidence
0.75

Evidence (1)

supports (1)

  • weight
    0.70

    locator: Factor (3), Data repetition during training subsection

    “after training on each datapoint ten times, it ends up memorizing the beginning of the play Coriolanus and regurgitating it when prompted”

Camps holding this claim (3)