X-R1  by dhcode-cpp

RL framework for low-cost training of 0.5B+ models

Created 7 months ago
769 stars

Top 45.4% on SourcePulse

GitHubView on GitHub
Project Summary

X-R1 is a minimal-cost, end-to-end reinforcement learning framework designed to accelerate the training of large language models, particularly for improving reasoning and format-following capabilities. It targets researchers and developers looking to efficiently fine-tune models like Qwen from 0.5B to 7B parameters with limited resources, claiming costs under $7 for training a 0.5B model in under an hour.

How It Works

The framework leverages Proximal Policy Optimization (PPO) variants, specifically GRPO, for reinforcement learning. It supports efficient training techniques like LoRA and Zero3, enabling minimal memory footprint and faster training times. The approach focuses on generating an "Aha Moment" by optimizing for specific reward signals, aiming to enhance model reasoning and adherence to desired output formats.

Quick Start & Requirements

  • Install: conda create -n xr1 python=3.11, conda activate xr1, pip install -r requirements.txt, pip install flash-attn.
  • Prerequisites: CUDA >= 12.4.
  • Setup: Requires setting up a Conda environment and installing dependencies. Training examples utilize accelerate launch with specific configuration files.
  • Links: wandb details, Colab Inference, Models.

Highlighted Details

  • Trains 0.5B models in ~1 hour on 4x3090/4090 GPUs for under $7.
  • Supports scaling to larger models (1.5B, 7B, 32B) with provided datasets.
  • Includes benchmark evaluation results for MATH500 and Chinese math reasoning.
  • Offers configurations for GRPO, LoRA, Zero3, and disabling KL divergence for performance gains.

Maintenance & Community

  • Active development with recent updates in February 2025.
  • Contact: dhcode95@gmail.com.
  • Acknowledges Open-R1 and TRL.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

  • The project is in active development with a "Todo" list including QLoRA support and more base model integrations. The absence of a specified license poses a significant adoption risk.
Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Deepak Pathak Deepak Pathak(Cofounder of Skild AI; Professor at CMU), Anastasis Germanidis Anastasis Germanidis(Cofounder of Runway), and
1 more.

deer by VinF

0%
489
Deep reinforcement learning framework
Created 9 years ago
Updated 2 months ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.1%
4k
RL recipe for reasoning ability in models
Created 7 months ago
Updated 1 month ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
19 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
Created 8 months ago
Updated 2 months ago
Feedback? Help us improve.