descriptive claim

Post-training via reinforcement learning from human feedback (RLHF) systematically reduces the per-token entropy of model outputs, because reducing sampling randomness reduces hallucination rates; in sample ChatGPT queries, per-token entropy measurements imply roughly 73-94% of output information corresponds to information present in the training dataset under the entropy-ordering assumption.

desc_rlhf_reduces_output_entropy

confidence

0.60

Evidence (1)