open-r1 by huggingface

SDK for reproducing DeepSeek-R1

Created 11 months ago

25,799 stars

Top 1.4% on SourcePulse

View on GitHub

22 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jeff Hammerbacher

Cofounder of Cloudera

Vincent Weisser

Cofounder of Prime Intellect

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

and 18 more!

Project Summary

This repository provides a fully open reproduction of the DeepSeek-R1 large language model, aiming to democratize access to advanced reasoning capabilities. It's designed for researchers and developers seeking to replicate, understand, and build upon state-of-the-art reasoning models.

How It Works

The project follows the DeepSeek-R1 technical report, breaking down reproduction into three stages: distilling high-quality corpora from DeepSeek-R1, replicating its pure RL pipeline (likely involving new large-scale datasets for math, reasoning, and code), and demonstrating multi-stage training from base models to RL-tuned versions. It leverages Hugging Face's accelerate for distributed training and vLLM for efficient inference, supporting both Supervised Fine-Tuning (SFT) and Proximal Policy Optimization (PPO) variants like GRPO.

Quick Start & Requirements

Installation: Requires Python 3.11 and CUDA 12.4. Install dependencies via uv venv openr1 --python 3.11 && source openr1/bin/activate && uv pip install --upgrade pip, followed by uv pip install vllm==0.8.4 flash-attn --no-build-isolation, and then pip install -e .[dev].
Authentication: Log in to Hugging Face Hub and Weights & Biases using huggingface-cli login and wandb login.
Prerequisites: Git LFS must be installed. Training commands are optimized for 8x H100 GPUs, requiring adjustments for different hardware.
Documentation: Installation Guide, Training, Evaluation, and Data Generation details are available within the README.

Highlighted Details

Released new datasets: CodeForces-CoTs (10k problems, 100k solutions) and IOI24 benchmark.
Released OpenR1-Math-220k dataset, enabling models to match DeepSeek's distilled performance.
Implemented core training, inference, and evaluation pipelines.
Supports training with DeepSpeed (ZeRO-2/3) and DDP.

Maintenance & Community

This is an active community-driven project. Contributions are welcomed via GitHub issues. Links to relevant tools like vLLM and SGLang are acknowledged.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is explicitly marked as "a work in progress." Some installation steps and configurations are highly specific to 8x H100 GPU setups and may require significant adaptation for other hardware. The README notes potential discrepancies in evaluation results compared to DeepSeek's reported figures due to sampling differences.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

116 stars in the last 30 days