X-R1 by dhcode-cpp

RL framework for low-cost training of 0.5B+ models

Created 1 year ago

808 stars

Top 43.7% on SourcePulse

Project Summary

X-R1 is a minimal-cost, end-to-end reinforcement learning framework designed to accelerate the training of large language models, particularly for improving reasoning and format-following capabilities. It targets researchers and developers looking to efficiently fine-tune models like Qwen from 0.5B to 7B parameters with limited resources, claiming costs under $7 for training a 0.5B model in under an hour.

How It Works

The framework leverages Proximal Policy Optimization (PPO) variants, specifically GRPO, for reinforcement learning. It supports efficient training techniques like LoRA and Zero3, enabling minimal memory footprint and faster training times. The approach focuses on generating an "Aha Moment" by optimizing for specific reward signals, aiming to enhance model reasoning and adherence to desired output formats.

Quick Start & Requirements

Install: conda create -n xr1 python=3.11, conda activate xr1, pip install -r requirements.txt, pip install flash-attn.
Prerequisites: CUDA >= 12.4.
Setup: Requires setting up a Conda environment and installing dependencies. Training examples utilize accelerate launch with specific configuration files.
Links: wandb details, Colab Inference, Models.

Highlighted Details

Trains 0.5B models in ~1 hour on 4x3090/4090 GPUs for under $7.
Supports scaling to larger models (1.5B, 7B, 32B) with provided datasets.
Includes benchmark evaluation results for MATH500 and Chinese math reasoning.
Offers configurations for GRPO, LoRA, Zero3, and disabling KL divergence for performance gains.

Maintenance & Community

Active development with recent updates in February 2025.
Contact: dhcode95@gmail.com.
Acknowledges Open-R1 and TRL.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The project is in active development with a "Todo" list including QLoRA support and more base model integrations. The absence of a specified license poses a significant adoption risk.

Health Check

Last Commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days