Skip to main content
PaperLLMsvR1

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

by DeepSeek · open-source · Last verified 2026-03-17

DeepSeek-R1 demonstrates that pure reinforcement learning with rule-based rewards—without supervised fine-tuning on chain-of-thought data—can incentivize emergent reasoning capabilities in LLMs including self-verification, reflection, and long chain-of-thought. The model achieves performance comparable to OpenAI-o1 on reasoning benchmarks while being fully open-sourced, triggering a significant industry response.

https://arxiv.org/abs/2501.12948
B+
B+Good
Adoption: AQuality: A+Freshness: A+Citations: AEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
mathematical-reasoning, code-generation, logical-reasoning, self-reflection, long-cot
Integrations
Ollama, Hugging Face, Together AI, Groq
Use Cases
math-problem-solving, code-generation, scientific-reasoning, complex-qa
API Available
Yes
Tags
reasoning, reinforcement-learning, deepseek, chain-of-thought, open-source, 2025
Added
2026-03-17
Completeness
100%

Index Score

74.5
Adoption
88
Quality
94
Freshness
97
Citations
82
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service