descriptive claim
Reinforcement-learning-based reasoning training (the second-stage paradigm behind o1 and R1) is at an early crossover point on its scaling curve where spending $1M instead of $0.1M produces large capability gains, allowing multiple labs to temporarily produce comparable reasoning models.
desc_rl_reasoning_crossover_point
confidence 0.70
Evidence (1)
supports (1)
- On DeepSeek and Export Controls expert_estimateweight0.75
locator: Three Dynamics section, #3 Shifting the paradigm
βSpending $1M instead of $0.1M is enough to get huge gains... we're at a unique 'crossover point' where there is a powerful new paradigm that is early on the scaling curve.β