RL framework for low-cost training of 0.5B+ models
Top 46.5% on sourcepulse
X-R1 is a minimal-cost, end-to-end reinforcement learning framework designed to accelerate the training of large language models, particularly for improving reasoning and format-following capabilities. It targets researchers and developers looking to efficiently fine-tune models like Qwen from 0.5B to 7B parameters with limited resources, claiming costs under $7 for training a 0.5B model in under an hour.
How It Works
The framework leverages Proximal Policy Optimization (PPO) variants, specifically GRPO, for reinforcement learning. It supports efficient training techniques like LoRA and Zero3, enabling minimal memory footprint and faster training times. The approach focuses on generating an "Aha Moment" by optimizing for specific reward signals, aiming to enhance model reasoning and adherence to desired output formats.
Quick Start & Requirements
conda create -n xr1 python=3.11
, conda activate xr1
, pip install -r requirements.txt
, pip install flash-attn
.accelerate launch
with specific configuration files.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 months ago
1 day