descriptive claim
Post-training via reinforcement learning from human feedback (RLHF) systematically reduces the per-token entropy of model outputs, because reducing sampling randomness reduces hallucination rates; in sample ChatGPT queries, per-token entropy measurements imply roughly 73-94% of output information corresponds to information present in the training dataset under the entropy-ordering assumption.
desc_rlhf_reduces_output_entropy
confidence 0.60
Evidence (1)
supports (1)
- When does generative AI qualify for fair use? expert_estimateweight0.55
locator: Factor (3), Reinforcement learning subsection
โIf H(X) ~ 0.95 bits per character, we'd estimate between 73% to 94% of these outputs correspond to information in the training dataset.โ