Research code for math reasoning without process supervision
Top 82.8% on sourcepulse
This repository provides the official implementation for AlphaMath Almost Zero, a method for mathematical reasoning that leverages Monte Carlo Tree Search (MCTS) to generate supervision signals, eliminating the need for external LLMs like GPT-4 or human annotations. It targets researchers and practitioners in AI for education and mathematical problem-solving, offering a novel approach to self-supervised training for math-capable language models.
How It Works
The core innovation lies in using MCTS to automatically generate reasoning steps and evaluate their correctness. This process creates synthetic training data for policy and value models. The MCTS framework explores potential solution paths, and a value model learns to predict the quality of intermediate steps, guiding the search towards correct solutions without relying on pre-existing human-labeled data or external LLM outputs.
Quick Start & Requirements
requirements.txt
, the evaluation toolkit, and the customized vllm.Highlighted Details
Maintenance & Community
The project is associated with research from Qwen team and has multiple related publications and pre-prints. Links to related works and citations are provided.
Licensing & Compatibility
The repository itself does not explicitly state a license in the README. However, the project is associated with academic research, and the code is extracted from an internal corporate codebase, suggesting potential licensing considerations for commercial use.
Limitations & Caveats
The README notes that the released code may have slight differences from the internal corporate codebase used for reported paper numbers. The training code only releases implementation details of some key functions due to policy.
1 month ago
1 day