source · blog

When does generative AI qualify for fair use?

src_balaji_fair_use

https://suchir.net/fair_use.html

reliability

0.45

authors: Suchir Balaji

published: 2024-10-23

accessed: 2026-04-19

Notes

Personal blog post by a former OpenAI researcher applying the four-factor fair use test to ChatGPT. Above baseline blog prior (0.35) because author has direct domain expertise (ML + firsthand knowledge of OpenAI training practices) and the legal reasoning is grounded in cited case law; still a blog, not peer-reviewed or adjudicated.

Intake provenance

method: httpx
tool: afls-ingest/0.0.1
git sha: 604c9dfd252a
at: 2026-04-19T18:50:56.387914Z
sha256: 73fcd8fee88e…

Evidence from this source (5)

Repeated exposure to the same datapoint during training causes generative model… support

weight

0.70

method: direct_measurement · locator: Factor (3), Data repetition during training subsection

“after training on each datapoint ten times, it ends up memorizing the beginning of the play Coriolanus and regurgitating it when prompted”
Major model developers including OpenAI and Google have signed paid data-licens… support

weight

0.75

method: journalistic_report · locator: Factor (4) section

“Model developers like OpenAI and Google have also signed many data licensing agreements to train their models on copyrighted data: for example with Stack Overflow, Reddit, The Associated Press, News Corp, etc.”
Post-training via reinforcement learning from human feedback (RLHF) systematica… support

weight

0.55

method: expert_estimate · locator: Factor (3), Reinforcement learning subsection

“If H(X) ~ 0.95 bits per character, we'd estimate between 73% to 94% of these outputs correspond to information in the training dataset.”
Under the assumption that a generative model's output entropy is at most the tr… support

weight

0.80

method: expert_estimate · locator: Factor (3) section, RMI derivation

“When H(Y) <= H(X), we can bound the RMI from below as [1 - H(Y)/H(X)]”
Empirical studies have measured a ~12% decline in Stack Overflow traffic follow… support

weight

0.55

method: journalistic_report · locator: Factor (4) section, citing 'The consequences of generative AI for online knowledge communities'

“traffic to Stack Overflow declined by about 12% after the release of ChatGPT”