Open-Reasoner-Zero by Open-Reasoner-Zero

Open-source RL training for scalable reasoning on base models

Created 10 months ago

2,086 stars

Top 21.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

Open-Reasoner-Zero provides an open-source implementation for large-scale, reasoning-oriented Reinforcement Learning (RL) training, focusing on scalability, simplicity, and accessibility. It targets researchers and developers aiming to advance artificial general intelligence (AGI) through efficient RL training on base models. The project offers a complete suite of code, data, and model weights, enabling superior performance on benchmarks like AIME2024 and MATH500 with significantly reduced training steps.

How It Works

The project employs a minimalist RL training recipe, utilizing a single controller trainer design that co-locates training and generation on the same GPUs to maximize utilization. This approach, built upon frameworks like OpenRLHF, vLLM, DeepSpeed, and Ray, allows for efficient scaling across various model sizes (0.5B to 32B parameters) while maintaining training stability and robustness.

Quick Start & Requirements

Installation: pip install -e .
Prerequisites: Docker is available for reproducibility. Training scripts are provided for multi-node setups using Ray for larger models (7B, 32B) and single-node/GPU for smaller models (0.5B, 1.5B). Debugging can be done with DEBUG_MODE=True.
Resources: The 0.5B model can be trained on a single A800/H800 GPU. Multi-node setups are detailed for 7B (4 nodes) and 32B (16 nodes) models.
Links: Paper, HF Models, Data

Highlighted Details

Achieves superior performance on AIME2024, MATH500, and GPQA Diamond benchmarks.
Requires only a tenth of the training steps compared to similar pipelines.
Demonstrates consistent scalability across model sizes from 0.5B to 32B.
Offers a full suite of critic models for in-depth research.

Maintenance & Community

The project is actively maintained with recent updates in March 2025, including new results, training scripts, curated datasets, and Hugging Face models. Community discussions are facilitated via WeChat groups.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README, but it is released as open-source. Compatibility with commercial use or closed-source linking would require clarification.

Limitations & Caveats

The README mentions a "WIP" paper, suggesting ongoing development. Specific licensing details for commercial use are not provided, which may pose a limitation for some adoption scenarios.

Health Check

Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

13 stars in the last 30 days