train-deepseek-r1 by FareedKhan-dev

Replicate DeepSeek R1 LLM training from scratch

Created 11 months ago

737 stars

Top 47.1% on SourcePulse

Project Summary

This repository provides a step-by-step guide and code to replicate the DeepSeek R1 reasoning model training process. It targets engineers and researchers interested in understanding and implementing advanced reinforcement learning techniques for LLMs, specifically focusing on improving reasoning capabilities. The project aims to demystify the complex training pipeline of DeepSeek R1 by offering a practical, code-driven explanation with simplified components.

How It Works

The project breaks down the DeepSeek R1 training into manageable stages, starting with a GRPO (Gradient Reward Policy Optimization) based approach for an initial "R1 Zero" model. This involves using a smaller base model (Qwen2.5-0.5B-Instruct) and applying multiple reward functions (accuracy, format, reasoning steps, cosine scaling, repetition penalty) to guide the learning process. Following this, it details Supervised Fine-Tuning (SFT) using curated datasets like Bespoke-Stratos-17k to improve reasoning clarity and language consistency, addressing issues found in R1 Zero. The theoretical aspects of subsequent RL stages and distillation are also covered.

Quick Start & Requirements

Install: Clone the repository and run pip install -r requirements.txt.
Prerequisites: Python, PyTorch, Hugging Face Transformers, TRL library. GPU recommended for training.
Setup: Clone repository, install dependencies. Estimated setup time: 15-30 minutes.
Resources: The project uses a small base model (Qwen2.5-0.5B-Instruct), making it runnable on consumer hardware with a GPU.
Links: GitHub Repository

Highlighted Details

Implements GRPO for initial model training with multiple custom reward functions.
Demonstrates Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) and direct prompting techniques.
Utilizes Hugging Face datasets and trl libraries for efficient data handling and training.
Explains theoretical concepts like rejection sampling and distillation for model refinement.
Provides code examples for each stage, including reward function implementations.

Maintenance & Community

The repository is maintained by FareedKhan-dev.
No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The repository itself does not explicitly state a license in the provided README snippet. Code examples are generally permissive, but the underlying models (like Qwen) have their own licenses.

Limitations & Caveats

The project focuses on replicating the process and theory of DeepSeek R1, using a smaller base model and simplified datasets. It does not claim to achieve the exact performance of the original DeepSeek R1.
Some advanced stages like final RL alignment and distillation are described theoretically rather than fully implemented.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

10 stars in the last 30 days

Explore Similar Projects

Awesome-Long2short-on-LRMs by Hongcheng-Gao

Optimizing large reasoning models for concise outputs

Created 10 months ago

Updated 5 months ago

CoT-Collection by kaistAI

Chain-of-Thought fine-tuning dataset and models for enhanced LLM learning

Created 2 years ago

Updated 2 years ago

LightReasoner by HKUDS

LLM reasoning enhancement via SLM-LLM knowledge transfer

Created 3 months ago

Updated 2 months ago

Raspberry by daveshap

Open-source dataset for finetuning LLMs with reasoning

Created 1 year ago

Updated 1 year ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

ReasonFlux by Gen-Verse

LLM post-training algorithms for data selection, RL, and inference

Created 11 months ago

Updated 3 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Tina by shangshang-wang

LoRA reasoning models

Created 9 months ago

Updated 3 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

POLARIS by ChenxinAn-fdu

Scaling RL for advanced reasoning models

Created 6 months ago

Updated 2 months ago

reasoning-on-graphs by RManLuo

Framework for faithful, interpretable LLM reasoning via knowledge graphs

Created 2 years ago

Updated 10 months ago

M_GRPO by baibizhe

Stabilizing LLM reasoning with self-supervised RL

Created 4 months ago

Updated 2 months ago

Starred by

Alex Chen

Alex Chen(Cofounder of Nexa AI),

Woosuk Kwon

Woosuk Kwon(Coauthor of vLLM), and

2 more.

MiMo by XiaomiMiMo

LLM for reasoning, pre-trained and post-trained for math/code tasks

Created 8 months ago

Updated 7 months ago

Starred by

Casper Hansen

Casper Hansen(Author of AutoAWQ),

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI), and

2 more.

rStar by zhentingqi

Research paper for improving small LLM reasoning via mutual reasoning

Created 1 year ago

Updated 11 months ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory), and

1 more.

openr by openreasoner

Open-source framework for advanced LLM reasoning

Created 1 year ago

Updated 11 months ago

Feedback? Help us improve.