ai-for-less-suffering.com

โ† all claims

descriptive claim

Post-training via reinforcement learning from human feedback (RLHF) systematically reduces the per-token entropy of model outputs, because reducing sampling randomness reduces hallucination rates; in sample ChatGPT queries, per-token entropy measurements imply roughly 73-94% of output information corresponds to information present in the training dataset under the entropy-ordering assumption.

desc_rlhf_reduces_output_entropy

confidence
0.60

Evidence (1)

supports (1)

  • weight
    0.55

    locator: Factor (3), Reinforcement learning subsection

    โ€œIf H(X) ~ 0.95 bits per character, we'd estimate between 73% to 94% of these outputs correspond to information in the training dataset.โ€

Camps holding this claim (4)