Self-play reasoning framework needing zero data
Top 26.3% on sourcepulse
Absolute Zero Reasoner (AZR) is an open-source framework for training large language models to perform complex reasoning tasks, such as code generation and mathematical problem-solving, entirely through self-play and reinforcement learning, without relying on any external datasets. It targets researchers and developers looking to enhance LLM reasoning capabilities in a data-efficient manner.
How It Works
AZR employs a novel iterative self-play loop consisting of two core phases: PROPOSE and SOLVE. In PROPOSE, the model generates reasoning tasks across abduction, deduction, and induction, validating them with Python execution and assigning a learnability reward. In SOLVE, the model attempts to solve these self-generated tasks, receiving an accuracy reward upon successful Python execution verification. This continuous loop, powered by the TRR++ algorithm, enables the model to progressively improve its reasoning skills autonomously.
Quick Start & Requirements
vllm
(0.7.3) and transformers
(4.47.1). Installation involves setting up a Conda environment, installing dependencies via pip
, and potentially building flash-attn
.Highlighted Details
Maintenance & Community
The project is actively developed by LeapLabTHU. Links to WandB logs and contact information for a primary author are provided. A roadmap indicates planned updates for evaluation code and executor improvements.
Licensing & Compatibility
The repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The provided Python executor is explicitly stated as "very raw" and "not secure for production environments," with plans for future secure implementations. The project is research-oriented, and users assume all risks.
6 days ago
Inactive