R-Zero by Chengsong-Huang

Self-evolving LLM training from zero data

Created 5 months ago

725 stars

Top 47.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Yiran Wu

Coauthor of AutoGen

Phil Wang

Prolific Research Paper Implementer

Project Summary

R-Zero offers a framework for LLMs to autonomously improve their reasoning capabilities without requiring any external data or human-curated datasets. It targets researchers and developers aiming to build self-improving AI systems, enabling LLMs to learn from scratch through a novel co-evolutionary process.

How It Works

R-Zero employs a dynamic loop between two instances of a base LLM: a Challenger and a Solver. The Challenger generates challenging problems at the Solver's current capability edge, while the Solver learns to solve these increasingly difficult tasks. This creates an adaptive curriculum, guided by techniques like majority voting for pseudo-labels and relative policy optimization, allowing both models to improve iteratively.

Quick Start & Requirements

Install: Clone the repository, navigate into the directory, and run pip install -r requirements.txt.
Prerequisites: Set STORAGE_PATH environment variable for checkpoints and data, and HUGGINGFACENAME for dataset uploads. API keys for Hugging Face, WandB, and OpenAI GPT are required in tokens.json and evaluation/results_recheck.py.
Run: Execute bash scripts/main.sh [Base_Model_Name] [Abbreviation], e.g., bash scripts/main.sh Qwen/Qwen3-4B-Base qwen3-4b.
Resources: Experiments were conducted on an 8-GPU server; modifications may be needed for different hardware or larger models. Refer to EasyR1 for environment setup guidance.

Highlighted Details

Demonstrates significant performance gains on reasoning benchmarks like MATH, SuperGPQA, MMLU-Pro, and BBEH.
Achieves strong generalization, transferring learned reasoning skills to new domains.
Model-agnostic, improving performance across various backbone LLMs (e.g., Qwen, OctoThinker).
Framework structure is inspired by EasyR1, with evaluation referencing General-Reasoner.

Maintenance & Community

The project is based on EasyR1 and references General-Reasoner. No specific community channels or active maintenance signals are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, its dependency on EasyR1, which is Apache 2.0 licensed, suggests potential compatibility. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The code may encounter infinite loops in the math_verify library during questioner training, requiring restarts from checkpoints. The README suggests modifying code for hardware configurations different from the described 8-GPU setup.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

29 stars in the last 30 days