source · paper
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
src_deepseek_r1_paper
https://arxiv.org/abs/2501.12948
reliability 0.85
authors: DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Wenfeng Liang
published: 2025-01-22
accessed: 2026-04-19
Notes
Nudged above paper prior (0.82) because the work was peer-reviewed and published in Nature (vol 645, pp 633-638, 2025) in addition to the arXiv preprint.
Intake provenance
- method
- httpx
- tool
- afls-ingest/0.0.1
- git sha
- 4d098737f648
- at
- 2026-04-19T20:47:57.761518Z
- sha256
- 57a5dc3bd995…
Evidence from this source (4)
- weight0.85
method: direct_measurement · locator: Abstract
“the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations.”
- weight0.85
method: direct_measurement · locator: Abstract
“The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation.”
- weight0.90
method: direct_measurement · locator: Abstract
“Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories.”
- weight0.80
method: direct_measurement · locator: Abstract
“the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.”