train-deepseek-r1  by FareedKhan-dev

Replicate DeepSeek R1 LLM training from scratch

Created 7 months ago
698 stars

Top 48.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a step-by-step guide and code to replicate the DeepSeek R1 reasoning model training process. It targets engineers and researchers interested in understanding and implementing advanced reinforcement learning techniques for LLMs, specifically focusing on improving reasoning capabilities. The project aims to demystify the complex training pipeline of DeepSeek R1 by offering a practical, code-driven explanation with simplified components.

How It Works

The project breaks down the DeepSeek R1 training into manageable stages, starting with a GRPO (Gradient Reward Policy Optimization) based approach for an initial "R1 Zero" model. This involves using a smaller base model (Qwen2.5-0.5B-Instruct) and applying multiple reward functions (accuracy, format, reasoning steps, cosine scaling, repetition penalty) to guide the learning process. Following this, it details Supervised Fine-Tuning (SFT) using curated datasets like Bespoke-Stratos-17k to improve reasoning clarity and language consistency, addressing issues found in R1 Zero. The theoretical aspects of subsequent RL stages and distillation are also covered.

Quick Start & Requirements

  • Install: Clone the repository and run pip install -r requirements.txt.
  • Prerequisites: Python, PyTorch, Hugging Face Transformers, TRL library. GPU recommended for training.
  • Setup: Clone repository, install dependencies. Estimated setup time: 15-30 minutes.
  • Resources: The project uses a small base model (Qwen2.5-0.5B-Instruct), making it runnable on consumer hardware with a GPU.
  • Links: GitHub Repository

Highlighted Details

  • Implements GRPO for initial model training with multiple custom reward functions.
  • Demonstrates Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) and direct prompting techniques.
  • Utilizes Hugging Face datasets and trl libraries for efficient data handling and training.
  • Explains theoretical concepts like rejection sampling and distillation for model refinement.
  • Provides code examples for each stage, including reward function implementations.

Maintenance & Community

  • The repository is maintained by FareedKhan-dev.
  • No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the provided README snippet. Code examples are generally permissive, but the underlying models (like Qwen) have their own licenses.

Limitations & Caveats

  • The project focuses on replicating the process and theory of DeepSeek R1, using a smaller base model and simplified datasets. It does not claim to achieve the exact performance of the original DeepSeek R1.
  • Some advanced stages like final RL alignment and distillation are described theoretically rather than fully implemented.
Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.