Discover and explore top open-source AI tools and projects—updated daily.
Self-evolving LLM training from zero data
Top 53.1% on SourcePulse
R-Zero offers a framework for LLMs to autonomously improve their reasoning capabilities without requiring any external data or human-curated datasets. It targets researchers and developers aiming to build self-improving AI systems, enabling LLMs to learn from scratch through a novel co-evolutionary process.
How It Works
R-Zero employs a dynamic loop between two instances of a base LLM: a Challenger and a Solver. The Challenger generates challenging problems at the Solver's current capability edge, while the Solver learns to solve these increasingly difficult tasks. This creates an adaptive curriculum, guided by techniques like majority voting for pseudo-labels and relative policy optimization, allowing both models to improve iteratively.
Quick Start & Requirements
pip install -r requirements.txt
.STORAGE_PATH
environment variable for checkpoints and data, and HUGGINGFACENAME
for dataset uploads. API keys for Hugging Face, WandB, and OpenAI GPT are required in tokens.json
and evaluation/results_recheck.py
.bash scripts/main.sh [Base_Model_Name] [Abbreviation]
, e.g., bash scripts/main.sh Qwen/Qwen3-4B-Base qwen3-4b
.Highlighted Details
Maintenance & Community
The project is based on EasyR1 and references General-Reasoner. No specific community channels or active maintenance signals are provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license. However, its dependency on EasyR1, which is Apache 2.0 licensed, suggests potential compatibility. Users should verify licensing for commercial or closed-source use.
Limitations & Caveats
The code may encounter infinite loops in the math_verify
library during questioner training, requiring restarts from checkpoints. The README suggests modifying code for hardware configurations different from the described 8-GPU setup.
6 days ago
Inactive