Absolute-Zero-Reasoner by LeapLabTHU

Self-play reasoning framework needing zero data

Created 10 months ago

1,816 stars

Top 23.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Yiran Wu

Coauthor of AutoGen

Project Summary

Absolute Zero Reasoner (AZR) is an open-source framework for training large language models to perform complex reasoning tasks, such as code generation and mathematical problem-solving, entirely through self-play and reinforcement learning, without relying on any external datasets. It targets researchers and developers looking to enhance LLM reasoning capabilities in a data-efficient manner.

How It Works

AZR employs a novel iterative self-play loop consisting of two core phases: PROPOSE and SOLVE. In PROPOSE, the model generates reasoning tasks across abduction, deduction, and induction, validating them with Python execution and assigning a learnability reward. In SOLVE, the model attempts to solve these self-generated tasks, receiving an accuracy reward upon successful Python execution verification. This continuous loop, powered by the TRR++ algorithm, enables the model to progressively improve its reasoning skills autonomously.

Quick Start & Requirements

Installation: Requires Python 3.10, CUDA Toolkit 12.4.1, and specific versions of vllm (0.7.3) and transformers (4.47.1). Installation involves setting up a Conda environment, installing dependencies via pip, and potentially building flash-attn.
Hardware: Training 3B models requires 2x 80GB GPUs, 7/8B models need 4x 80GB GPUs, and 14B models require 8x 80GB GPUs.
Links: Project Page, Paper, Models.

Highlighted Details

Achieves state-of-the-art performance on code and math reasoning benchmarks without external data.
Demonstrates significant performance gains across various model sizes (3B to 14B) and families (Llama3.1, Qwen2.5).
Supports custom intrinsic reward design for further fine-tuning.
Utilizes a fork of the veRL framework for reinforcement learning training and vLLM for rollouts.

Maintenance & Community

The project is actively developed by LeapLabTHU. Links to WandB logs and contact information for a primary author are provided. A roadmap indicates planned updates for evaluation code and executor improvements.

Licensing & Compatibility

The repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The provided Python executor is explicitly stated as "very raw" and "not secure for production environments," with plans for future secure implementations. The project is research-oriented, and users assume all risks.

Health Check

Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days