Absolute-Zero-Reasoner  by LeapLabTHU

Self-play reasoning framework needing zero data

created 3 months ago
1,629 stars

Top 26.3% on sourcepulse

GitHubView on GitHub
Project Summary

Absolute Zero Reasoner (AZR) is an open-source framework for training large language models to perform complex reasoning tasks, such as code generation and mathematical problem-solving, entirely through self-play and reinforcement learning, without relying on any external datasets. It targets researchers and developers looking to enhance LLM reasoning capabilities in a data-efficient manner.

How It Works

AZR employs a novel iterative self-play loop consisting of two core phases: PROPOSE and SOLVE. In PROPOSE, the model generates reasoning tasks across abduction, deduction, and induction, validating them with Python execution and assigning a learnability reward. In SOLVE, the model attempts to solve these self-generated tasks, receiving an accuracy reward upon successful Python execution verification. This continuous loop, powered by the TRR++ algorithm, enables the model to progressively improve its reasoning skills autonomously.

Quick Start & Requirements

  • Installation: Requires Python 3.10, CUDA Toolkit 12.4.1, and specific versions of vllm (0.7.3) and transformers (4.47.1). Installation involves setting up a Conda environment, installing dependencies via pip, and potentially building flash-attn.
  • Hardware: Training 3B models requires 2x 80GB GPUs, 7/8B models need 4x 80GB GPUs, and 14B models require 8x 80GB GPUs.
  • Links: Project Page, Paper, Models.

Highlighted Details

  • Achieves state-of-the-art performance on code and math reasoning benchmarks without external data.
  • Demonstrates significant performance gains across various model sizes (3B to 14B) and families (Llama3.1, Qwen2.5).
  • Supports custom intrinsic reward design for further fine-tuning.
  • Utilizes a fork of the veRL framework for reinforcement learning training and vLLM for rollouts.

Maintenance & Community

The project is actively developed by LeapLabTHU. Links to WandB logs and contact information for a primary author are provided. A roadmap indicates planned updates for evaluation code and executor improvements.

Licensing & Compatibility

The repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The provided Python executor is explicitly stated as "very raw" and "not secure for production environments," with plans for future secure implementations. The project is research-oriented, and users assume all risks.

Health Check
Last commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
1,648 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.