source · paper

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

src_deepseek_r1_paper

https://arxiv.org/abs/2501.12948

reliability

0.85

authors: DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Wenfeng Liang

published: 2025-01-22

accessed: 2026-04-19

Notes

Nudged above paper prior (0.82) because the work was peer-reviewed and published in Nature (vol 645, pp 633-638, 2025) in addition to the arXiv preprint.

Intake provenance

method: httpx
tool: afls-ingest/0.0.1
git sha: 4d098737f648
at: 2026-04-19T20:47:57.761518Z
sha256: 57a5dc3bd995…

Evidence from this source (4)

DeepSeek's R1 work demonstrates that LLM reasoning abilities can be incentivize… support

weight

0.85

method: direct_measurement · locator: Abstract

“the trained model achieves superior performance on verifiable tasks such as mathematics, coding competitions, and STEM fields, surpassing its counterparts trained via conventional supervised learning on human demonstrations.”
Under a pure-RL training regime on LLMs, advanced reasoning patterns including… support

weight

0.85

method: direct_measurement · locator: Abstract

“The proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation.”
DeepSeek's R1 work demonstrates that LLM reasoning abilities can be incentivize… support

weight

0.90

method: direct_measurement · locator: Abstract

“Here we show that the reasoning abilities of LLMs can be incentivized through pure reinforcement learning (RL), obviating the need for human-labeled reasoning trajectories.”
Reasoning patterns that emerge in large RL-trained models can be systematically… support

weight

0.80

method: direct_measurement · locator: Abstract

“the emergent reasoning patterns exhibited by these large-scale models can be systematically harnessed to guide and enhance the reasoning capabilities of smaller models.”