SDK for reproducing DeepSeek-R1
Top 1.6% on sourcepulse
This repository provides a fully open reproduction of the DeepSeek-R1 large language model, aiming to democratize access to advanced reasoning capabilities. It's designed for researchers and developers seeking to replicate, understand, and build upon state-of-the-art reasoning models.
How It Works
The project follows the DeepSeek-R1 technical report, breaking down reproduction into three stages: distilling high-quality corpora from DeepSeek-R1, replicating its pure RL pipeline (likely involving new large-scale datasets for math, reasoning, and code), and demonstrating multi-stage training from base models to RL-tuned versions. It leverages Hugging Face's accelerate
for distributed training and vLLM
for efficient inference, supporting both Supervised Fine-Tuning (SFT) and Proximal Policy Optimization (PPO) variants like GRPO.
Quick Start & Requirements
uv venv openr1 --python 3.11 && source openr1/bin/activate && uv pip install --upgrade pip
, followed by uv pip install vllm==0.8.4 flash-attn --no-build-isolation
, and then pip install -e .[dev]
.huggingface-cli login
and wandb login
.Highlighted Details
Maintenance & Community
This is an active community-driven project. Contributions are welcomed via GitHub issues. Links to relevant tools like vLLM and SGLang are acknowledged.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is explicitly marked as "a work in progress." Some installation steps and configurations are highly specific to 8x H100 GPU setups and may require significant adaptation for other hardware. The README notes potential discrepancies in evaluation results compared to DeepSeek's reported figures due to sampling differences.
3 days ago
1 day