Reasoning models research paper
Top 0.1% on sourcepulse
DeepSeek-R1 is a family of large language models focused on enhancing reasoning capabilities, particularly through reinforcement learning (RL). It offers both large Mixture-of-Experts (MoE) models (DeepSeek-R1-Zero and DeepSeek-R1) and smaller, distilled dense models based on Llama and Qwen architectures, targeting researchers and developers seeking advanced reasoning performance.
How It Works
The core innovation lies in applying RL directly to base models without initial supervised fine-tuning (SFT), enabling emergent reasoning behaviors like self-verification and long chain-of-thought generation. DeepSeek-R1 further refines this with a multi-stage RL and SFT pipeline. Distillation techniques are then used to transfer these reasoning patterns into smaller, more accessible dense models, achieving state-of-the-art results for their size.
Quick Start & Requirements
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
) or SGLang (python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Hugging Face Transformers is not directly supported for the base R1 models. Specific prompt formatting and configuration are recommended to achieve optimal reasoning performance, including prepending "<think>\n" to enforce reasoning steps.
1 month ago
1 week