source · blog
When does generative AI qualify for fair use?
src_balaji_fair_use
https://suchir.net/fair_use.html
authors: Suchir Balaji
published: 2024-10-23
accessed: 2026-04-19
Notes
Personal blog post by a former OpenAI researcher applying the four-factor fair use test to ChatGPT. Above baseline blog prior (0.35) because author has direct domain expertise (ML + firsthand knowledge of OpenAI training practices) and the legal reasoning is grounded in cited case law; still a blog, not peer-reviewed or adjudicated.
Intake provenance
- method
- httpx
- tool
- afls-ingest/0.0.1
- git sha
- 604c9dfd252a
- at
- 2026-04-19T18:50:56.387914Z
- sha256
- 73fcd8fee88e…
Evidence from this source (5)
- weight0.70
method: direct_measurement · locator: Factor (3), Data repetition during training subsection
“after training on each datapoint ten times, it ends up memorizing the beginning of the play Coriolanus and regurgitating it when prompted”
- weight0.75
method: journalistic_report · locator: Factor (4) section
“Model developers like OpenAI and Google have also signed many data licensing agreements to train their models on copyrighted data: for example with Stack Overflow, Reddit, The Associated Press, News Corp, etc.”
- weight0.55
method: expert_estimate · locator: Factor (3), Reinforcement learning subsection
“If H(X) ~ 0.95 bits per character, we'd estimate between 73% to 94% of these outputs correspond to information in the training dataset.”
- weight0.80
method: expert_estimate · locator: Factor (3) section, RMI derivation
“When H(Y) <= H(X), we can bound the RMI from below as [1 - H(Y)/H(X)]”
- weight0.55
method: journalistic_report · locator: Factor (4) section, citing 'The consequences of generative AI for online knowledge communities'
“traffic to Stack Overflow declined by about 12% after the release of ChatGPT”