rStar by microsoft

Research paper repo for math reasoning in small LLMs via deep thinking

Created 1 year ago

1,375 stars

Top 29.2% on SourcePulse

View on GitHub

3 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Travis Addair

Cofounder of Predibase

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

rStar-Math enables small language models (SLMs) to achieve state-of-the-art math reasoning capabilities, rivaling larger models without requiring distillation. It targets researchers and developers working on improving LLM reasoning, offering a framework for enhanced performance through self-evolved deep thinking.

How It Works

The core innovation is "deep thinking" via Monte Carlo Tree Search (MCTS). An SLM acts as a policy model, guiding a test-time search. This search is further refined by an SLM-based process reward model, which evaluates the quality of reasoning steps. This approach allows SLMs to explore multiple reasoning paths and self-correct, leading to more robust mathematical problem-solving.

Quick Start & Requirements

Environment: Conda with Python 3.11.
Hardware: A100 80G GPU with CUDA 12.4 is recommended.
Installation: pip install -r requirements.txt. Flash-attention 2 is optional.
Evaluation Toolkit: Requires cloning and installing the MARIO_EVAL toolkit.
CUDA Compatibility: A workaround is provided for CUDA versions lower than 12.4.
Docs: Paper

Highlighted Details

Demonstrates SLMs rivaling or surpassing OpenAI's o1-mini on math reasoning.
Utilizes Monte Carlo Tree Search (MCTS) for deep thinking and self-evolution.
Employs an SLM-based process reward model to guide search.
Provides scripts for data generation, SFT/RM training, and inference/evaluation.

Maintenance & Community

Actively hiring interns for LLM reasoning research.
Code and paper are open-sourced.
Prior work on "Mutual Reasoning" is available on a separate branch.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

Requires significant GPU resources, particularly for training and extensive MCTS.
The setup and training scripts are complex, with specific hardware and CUDA version recommendations.
The license is not specified, which may impact commercial use or integration into closed-source projects.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days