open-r1  by huggingface

SDK for reproducing DeepSeek-R1

created 6 months ago
25,159 stars

Top 1.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a fully open reproduction of the DeepSeek-R1 large language model, aiming to democratize access to advanced reasoning capabilities. It's designed for researchers and developers seeking to replicate, understand, and build upon state-of-the-art reasoning models.

How It Works

The project follows the DeepSeek-R1 technical report, breaking down reproduction into three stages: distilling high-quality corpora from DeepSeek-R1, replicating its pure RL pipeline (likely involving new large-scale datasets for math, reasoning, and code), and demonstrating multi-stage training from base models to RL-tuned versions. It leverages Hugging Face's accelerate for distributed training and vLLM for efficient inference, supporting both Supervised Fine-Tuning (SFT) and Proximal Policy Optimization (PPO) variants like GRPO.

Quick Start & Requirements

  • Installation: Requires Python 3.11 and CUDA 12.4. Install dependencies via uv venv openr1 --python 3.11 && source openr1/bin/activate && uv pip install --upgrade pip, followed by uv pip install vllm==0.8.4 flash-attn --no-build-isolation, and then pip install -e .[dev].
  • Authentication: Log in to Hugging Face Hub and Weights & Biases using huggingface-cli login and wandb login.
  • Prerequisites: Git LFS must be installed. Training commands are optimized for 8x H100 GPUs, requiring adjustments for different hardware.
  • Documentation: Installation Guide, Training, Evaluation, and Data Generation details are available within the README.

Highlighted Details

  • Released new datasets: CodeForces-CoTs (10k problems, 100k solutions) and IOI24 benchmark.
  • Released OpenR1-Math-220k dataset, enabling models to match DeepSeek's distilled performance.
  • Implemented core training, inference, and evaluation pipelines.
  • Supports training with DeepSpeed (ZeRO-2/3) and DDP.

Maintenance & Community

This is an active community-driven project. Contributions are welcomed via GitHub issues. Links to relevant tools like vLLM and SGLang are acknowledged.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is explicitly marked as "a work in progress." Some installation steps and configurations are highly specific to 8x H100 GPU setups and may require significant adaptation for other hardware. The README notes potential discrepancies in evaluation results compared to DeepSeek's reported figures due to sampling differences.

Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
5
Star History
1,061 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 21 hours ago
Feedback? Help us improve.