descriptive claim
DeepSeek's R1 work demonstrates that LLM reasoning abilities can be incentivized through pure reinforcement learning without human-labeled reasoning trajectories, with the RL-trained model outperforming counterparts trained via conventional supervised learning on human demonstrations for verifiable tasks in mathematics, coding competitions, and STEM fields.
desc_r1_pure_rl_incentivizes_reasoning
confidence 0.85
Evidence (2)
supports (2)
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning direct_measurementweight0.85
locator: Abstract
“the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations.”
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning direct_measurementweight0.90
locator: Abstract
“Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories.”