Super_MARIO by MARIO-Math-Reasoning

Research code for math reasoning without process supervision

Created 1 year ago

342 stars

Top 81.1% on SourcePulse

Project Summary

This repository provides the official implementation for AlphaMath Almost Zero, a method for mathematical reasoning that leverages Monte Carlo Tree Search (MCTS) to generate supervision signals, eliminating the need for external LLMs like GPT-4 or human annotations. It targets researchers and practitioners in AI for education and mathematical problem-solving, offering a novel approach to self-supervised training for math-capable language models.

How It Works

The core innovation lies in using MCTS to automatically generate reasoning steps and evaluate their correctness. This process creates synthetic training data for policy and value models. The MCTS framework explores potential solution paths, and a value model learns to predict the quality of intermediate steps, guiding the search towards correct solutions without relying on pre-existing human-labeled data or external LLM outputs.

Quick Start & Requirements

Installation: Clone repositories for Super_MARIO, MARIO_EVAL, and a customized vllm. Install dependencies via requirements.txt, the evaluation toolkit, and the customized vllm.
Prerequisites: Python, PyTorch. A deepseek-math-7b-base model is required for checkpoint initialization.
Resources: Requires significant computational resources for training and inference, particularly for MCTS.
Links:
- Super_MARIO: https://github.com/MARIO-Math-Reasoning/Super_MARIO
- MARIO_EVAL: https://github.com/MARIO-Math-Reasoning/MARIO_EVAL
- vllm: https://github.com/MARIO-Math-Reasoning/vllm

Highlighted Details

Achieves 69.94% accuracy on the MATH dataset with 5 runs of Step-level Beam search combined with majority voting.
Offers implementations for Greedy Decoding, Step-level Beam Search, and MCTS-based inference.
Provides pre-trained checkpoints for AlphaMath-7B and SVPO-7B models.
Includes a comprehensive evaluation toolkit (MARIO Eval) for assessing math LLMs.

Maintenance & Community

The project is associated with research from Qwen team and has multiple related publications and pre-prints. Links to related works and citations are provided.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. However, the project is associated with academic research, and the code is extracted from an internal corporate codebase, suggesting potential licensing considerations for commercial use.

Limitations & Caveats

The README notes that the released code may have slight differences from the internal corporate codebase used for reported paper numbers. The training code only releases implementation details of some key functions due to policy.

Health Check

Last Commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days