open-r1  by huggingface

SDK for reproducing DeepSeek-R1

Created 7 months ago
25,436 stars

Top 1.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a fully open reproduction of the DeepSeek-R1 large language model, aiming to democratize access to advanced reasoning capabilities. It's designed for researchers and developers seeking to replicate, understand, and build upon state-of-the-art reasoning models.

How It Works

The project follows the DeepSeek-R1 technical report, breaking down reproduction into three stages: distilling high-quality corpora from DeepSeek-R1, replicating its pure RL pipeline (likely involving new large-scale datasets for math, reasoning, and code), and demonstrating multi-stage training from base models to RL-tuned versions. It leverages Hugging Face's accelerate for distributed training and vLLM for efficient inference, supporting both Supervised Fine-Tuning (SFT) and Proximal Policy Optimization (PPO) variants like GRPO.

Quick Start & Requirements

  • Installation: Requires Python 3.11 and CUDA 12.4. Install dependencies via uv venv openr1 --python 3.11 && source openr1/bin/activate && uv pip install --upgrade pip, followed by uv pip install vllm==0.8.4 flash-attn --no-build-isolation, and then pip install -e .[dev].
  • Authentication: Log in to Hugging Face Hub and Weights & Biases using huggingface-cli login and wandb login.
  • Prerequisites: Git LFS must be installed. Training commands are optimized for 8x H100 GPUs, requiring adjustments for different hardware.
  • Documentation: Installation Guide, Training, Evaluation, and Data Generation details are available within the README.

Highlighted Details

  • Released new datasets: CodeForces-CoTs (10k problems, 100k solutions) and IOI24 benchmark.
  • Released OpenR1-Math-220k dataset, enabling models to match DeepSeek's distilled performance.
  • Implemented core training, inference, and evaluation pipelines.
  • Supports training with DeepSpeed (ZeRO-2/3) and DDP.

Maintenance & Community

This is an active community-driven project. Contributions are welcomed via GitHub issues. Links to relevant tools like vLLM and SGLang are acknowledged.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is explicitly marked as "a work in progress." Some installation steps and configurations are highly specific to 8x H100 GPU setups and may require significant adaptation for other hardware. The README notes potential discrepancies in evaluation results compared to DeepSeek's reported figures due to sampling differences.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
3
Star History
200 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

3.4%
1k
Framework for scaling multimodal model training across accelerators
Created 5 months ago
Updated 3 weeks ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 19 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.2%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 days ago
Feedback? Help us improve.