DeepSeek-R1  by deepseek-ai

Reasoning models research paper

Created 8 months ago
91,078 stars

Top 0.1% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSeek-R1 is a family of large language models focused on enhancing reasoning capabilities, particularly through reinforcement learning (RL). It offers both large Mixture-of-Experts (MoE) models (DeepSeek-R1-Zero and DeepSeek-R1) and smaller, distilled dense models based on Llama and Qwen architectures, targeting researchers and developers seeking advanced reasoning performance.

How It Works

The core innovation lies in applying RL directly to base models without initial supervised fine-tuning (SFT), enabling emergent reasoning behaviors like self-verification and long chain-of-thought generation. DeepSeek-R1 further refines this with a multi-stage RL and SFT pipeline. Distillation techniques are then used to transfer these reasoning patterns into smaller, more accessible dense models, achieving state-of-the-art results for their size.

Quick Start & Requirements

  • DeepSeek-R1 Models: Refer to the DeepSeek-V3 repository for local execution details. Hugging Face Transformers is not directly supported.
  • DeepSeek-R1-Distill Models: Can be served using vLLM (vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager) or SGLang (python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2).
  • Prerequisites: Python 3.x, vLLM or SGLang. Large models require significant GPU resources.
  • Usage Recommendations: Set temperature to 0.5-0.7, avoid system prompts, and prepend "<think>\n" to user prompts for consistent reasoning.

Highlighted Details

  • DeepSeek-R1 (671B MoE) achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
  • DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini on various benchmarks.
  • Models support context lengths up to 128K tokens.
  • Distilled models are available for Qwen2.5 (1.5B, 7B, 14B, 32B) and Llama 3/3.1 (8B, 70B).

Maintenance & Community

Licensing & Compatibility

  • MIT License for the code repository and model weights.
  • DeepSeek-R1 series support commercial use and modifications.
  • Distilled models inherit licenses from their base models: Qwen2.5 (Apache 2.0) and Llama 3/3.1 (Llama license).

Limitations & Caveats

Hugging Face Transformers is not directly supported for the base R1 models. Specific prompt formatting and configuration are recommended to achieve optimal reasoning performance, including prepending "<think>\n" to enforce reasoning steps.

Health Check
Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
32
Star History
444 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.