descriptive claim
Under a pure-RL training regime on LLMs, advanced reasoning patterns including self-reflection, verification, and dynamic strategy adaptation emerge without being explicitly supervised, according to DeepSeek's R1 experiments.
desc_r1_emergent_reasoning_patterns
confidence 0.80
Evidence (1)
supports (1)
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning direct_measurementweight0.85
locator: Abstract
βThe proposed RL framework facilitates the emergent development of advanced reasoning patterns, such as self-reflection, verification, and dynamic strategy adaptation.β