Open-Reasoner-Zero  by Open-Reasoner-Zero

Open-source RL training for scalable reasoning on base models

created 5 months ago
2,011 stars

Top 22.5% on sourcepulse

GitHubView on GitHub
Project Summary

Open-Reasoner-Zero provides an open-source implementation for large-scale, reasoning-oriented Reinforcement Learning (RL) training, focusing on scalability, simplicity, and accessibility. It targets researchers and developers aiming to advance artificial general intelligence (AGI) through efficient RL training on base models. The project offers a complete suite of code, data, and model weights, enabling superior performance on benchmarks like AIME2024 and MATH500 with significantly reduced training steps.

How It Works

The project employs a minimalist RL training recipe, utilizing a single controller trainer design that co-locates training and generation on the same GPUs to maximize utilization. This approach, built upon frameworks like OpenRLHF, vLLM, DeepSpeed, and Ray, allows for efficient scaling across various model sizes (0.5B to 32B parameters) while maintaining training stability and robustness.

Quick Start & Requirements

  • Installation: pip install -e .
  • Prerequisites: Docker is available for reproducibility. Training scripts are provided for multi-node setups using Ray for larger models (7B, 32B) and single-node/GPU for smaller models (0.5B, 1.5B). Debugging can be done with DEBUG_MODE=True.
  • Resources: The 0.5B model can be trained on a single A800/H800 GPU. Multi-node setups are detailed for 7B (4 nodes) and 32B (16 nodes) models.
  • Links: Paper, HF Models, Data

Highlighted Details

  • Achieves superior performance on AIME2024, MATH500, and GPQA Diamond benchmarks.
  • Requires only a tenth of the training steps compared to similar pipelines.
  • Demonstrates consistent scalability across model sizes from 0.5B to 32B.
  • Offers a full suite of critic models for in-depth research.

Maintenance & Community

The project is actively maintained with recent updates in March 2025, including new results, training scripts, curated datasets, and Hugging Face models. Community discussions are facilitated via WeChat groups.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README, but it is released as open-source. Compatibility with commercial use or closed-source linking would require clarification.

Limitations & Caveats

The README mentions a "WIP" paper, suggesting ongoing development. Specific licensing details for commercial use are not provided, which may pose a limitation for some adoption scenarios.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
120 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

SWE-Gym by SWE-Gym

1.0%
513
Environment for training software engineering agents
created 9 months ago
updated 4 days ago
Feedback? Help us improve.