DeepSeek-R1  by deepseek-ai

Reasoning models research paper

created 6 months ago
90,707 stars

Top 0.1% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSeek-R1 is a family of large language models focused on enhancing reasoning capabilities, particularly through reinforcement learning (RL). It offers both large Mixture-of-Experts (MoE) models (DeepSeek-R1-Zero and DeepSeek-R1) and smaller, distilled dense models based on Llama and Qwen architectures, targeting researchers and developers seeking advanced reasoning performance.

How It Works

The core innovation lies in applying RL directly to base models without initial supervised fine-tuning (SFT), enabling emergent reasoning behaviors like self-verification and long chain-of-thought generation. DeepSeek-R1 further refines this with a multi-stage RL and SFT pipeline. Distillation techniques are then used to transfer these reasoning patterns into smaller, more accessible dense models, achieving state-of-the-art results for their size.

Quick Start & Requirements

  • DeepSeek-R1 Models: Refer to the DeepSeek-V3 repository for local execution details. Hugging Face Transformers is not directly supported.
  • DeepSeek-R1-Distill Models: Can be served using vLLM (vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager) or SGLang (python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2).
  • Prerequisites: Python 3.x, vLLM or SGLang. Large models require significant GPU resources.
  • Usage Recommendations: Set temperature to 0.5-0.7, avoid system prompts, and prepend "<think>\n" to user prompts for consistent reasoning.

Highlighted Details

  • DeepSeek-R1 (671B MoE) achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
  • DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini on various benchmarks.
  • Models support context lengths up to 128K tokens.
  • Distilled models are available for Qwen2.5 (1.5B, 7B, 14B, 32B) and Llama 3/3.1 (8B, 70B).

Maintenance & Community

Licensing & Compatibility

  • MIT License for the code repository and model weights.
  • DeepSeek-R1 series support commercial use and modifications.
  • Distilled models inherit licenses from their base models: Qwen2.5 (Apache 2.0) and Llama 3/3.1 (Llama license).

Limitations & Caveats

Hugging Face Transformers is not directly supported for the base R1 models. Specific prompt formatting and configuration are recommended to achieve optimal reasoning performance, including prepending "<think>\n" to enforce reasoning steps.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
36
Star History
2,549 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.