X-R1  by dhcode-cpp

RL framework for low-cost training of 0.5B+ models

created 5 months ago
763 stars

Top 46.5% on sourcepulse

GitHubView on GitHub
Project Summary

X-R1 is a minimal-cost, end-to-end reinforcement learning framework designed to accelerate the training of large language models, particularly for improving reasoning and format-following capabilities. It targets researchers and developers looking to efficiently fine-tune models like Qwen from 0.5B to 7B parameters with limited resources, claiming costs under $7 for training a 0.5B model in under an hour.

How It Works

The framework leverages Proximal Policy Optimization (PPO) variants, specifically GRPO, for reinforcement learning. It supports efficient training techniques like LoRA and Zero3, enabling minimal memory footprint and faster training times. The approach focuses on generating an "Aha Moment" by optimizing for specific reward signals, aiming to enhance model reasoning and adherence to desired output formats.

Quick Start & Requirements

  • Install: conda create -n xr1 python=3.11, conda activate xr1, pip install -r requirements.txt, pip install flash-attn.
  • Prerequisites: CUDA >= 12.4.
  • Setup: Requires setting up a Conda environment and installing dependencies. Training examples utilize accelerate launch with specific configuration files.
  • Links: wandb details, Colab Inference, Models.

Highlighted Details

  • Trains 0.5B models in ~1 hour on 4x3090/4090 GPUs for under $7.
  • Supports scaling to larger models (1.5B, 7B, 32B) with provided datasets.
  • Includes benchmark evaluation results for MATH500 and Chinese math reasoning.
  • Offers configurations for GRPO, LoRA, Zero3, and disabling KL divergence for performance gains.

Maintenance & Community

  • Active development with recent updates in February 2025.
  • Contact: dhcode95@gmail.com.
  • Acknowledges Open-R1 and TRL.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

  • The project is in active development with a "Todo" list including QLoRA support and more base model integrations. The absence of a specified license poses a significant adoption risk.
Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
55 stars in the last 90 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

TinyZero by Jiayi-Pan

0.2%
12k
Minimal reproduction of DeepSeek R1 Zero for countdown/multiplication tasks
created 6 months ago
updated 3 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.