Open-source RL training for scalable reasoning on base models
Top 22.5% on sourcepulse
Open-Reasoner-Zero provides an open-source implementation for large-scale, reasoning-oriented Reinforcement Learning (RL) training, focusing on scalability, simplicity, and accessibility. It targets researchers and developers aiming to advance artificial general intelligence (AGI) through efficient RL training on base models. The project offers a complete suite of code, data, and model weights, enabling superior performance on benchmarks like AIME2024 and MATH500 with significantly reduced training steps.
How It Works
The project employs a minimalist RL training recipe, utilizing a single controller trainer design that co-locates training and generation on the same GPUs to maximize utilization. This approach, built upon frameworks like OpenRLHF, vLLM, DeepSpeed, and Ray, allows for efficient scaling across various model sizes (0.5B to 32B parameters) while maintaining training stability and robustness.
Quick Start & Requirements
pip install -e .
DEBUG_MODE=True
.Highlighted Details
Maintenance & Community
The project is actively maintained with recent updates in March 2025, including new results, training scripts, curated datasets, and Hugging Face models. Community discussions are facilitated via WeChat groups.
Licensing & Compatibility
The project's licensing is not explicitly stated in the README, but it is released as open-source. Compatibility with commercial use or closed-source linking would require clarification.
Limitations & Caveats
The README mentions a "WIP" paper, suggesting ongoing development. Specific licensing details for commercial use are not provided, which may pose a limitation for some adoption scenarios.
2 months ago
1 week