train-deepseek-r1  by FareedKhan-dev

Replicate DeepSeek R1 LLM training from scratch

created 5 months ago
667 stars

Top 51.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a step-by-step guide and code to replicate the DeepSeek R1 reasoning model training process. It targets engineers and researchers interested in understanding and implementing advanced reinforcement learning techniques for LLMs, specifically focusing on improving reasoning capabilities. The project aims to demystify the complex training pipeline of DeepSeek R1 by offering a practical, code-driven explanation with simplified components.

How It Works

The project breaks down the DeepSeek R1 training into manageable stages, starting with a GRPO (Gradient Reward Policy Optimization) based approach for an initial "R1 Zero" model. This involves using a smaller base model (Qwen2.5-0.5B-Instruct) and applying multiple reward functions (accuracy, format, reasoning steps, cosine scaling, repetition penalty) to guide the learning process. Following this, it details Supervised Fine-Tuning (SFT) using curated datasets like Bespoke-Stratos-17k to improve reasoning clarity and language consistency, addressing issues found in R1 Zero. The theoretical aspects of subsequent RL stages and distillation are also covered.

Quick Start & Requirements

  • Install: Clone the repository and run pip install -r requirements.txt.
  • Prerequisites: Python, PyTorch, Hugging Face Transformers, TRL library. GPU recommended for training.
  • Setup: Clone repository, install dependencies. Estimated setup time: 15-30 minutes.
  • Resources: The project uses a small base model (Qwen2.5-0.5B-Instruct), making it runnable on consumer hardware with a GPU.
  • Links: GitHub Repository

Highlighted Details

  • Implements GRPO for initial model training with multiple custom reward functions.
  • Demonstrates Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) and direct prompting techniques.
  • Utilizes Hugging Face datasets and trl libraries for efficient data handling and training.
  • Explains theoretical concepts like rejection sampling and distillation for model refinement.
  • Provides code examples for each stage, including reward function implementations.

Maintenance & Community

  • The repository is maintained by FareedKhan-dev.
  • No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the provided README snippet. Code examples are generally permissive, but the underlying models (like Qwen) have their own licenses.

Limitations & Caveats

  • The project focuses on replicating the process and theory of DeepSeek R1, using a smaller base model and simplified datasets. It does not claim to achieve the exact performance of the original DeepSeek R1.
  • Some advanced stages like final RL alignment and distillation are described theoretically rather than fully implemented.
Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
91 stars in the last 90 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

TinyZero by Jiayi-Pan

0.2%
12k
Minimal reproduction of DeepSeek R1 Zero for countdown/multiplication tasks
created 6 months ago
updated 3 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 4 days ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of Build a Large Language Model From Scratch), and
6 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
created 6 months ago
updated 1 month ago
Feedback? Help us improve.