rStar  by microsoft

Research paper repo for math reasoning in small LLMs via deep thinking

created 11 months ago
605 stars

Top 54.9% on sourcepulse

GitHubView on GitHub
Project Summary

rStar-Math enables small language models (SLMs) to achieve state-of-the-art math reasoning capabilities, rivaling larger models without requiring distillation. It targets researchers and developers working on improving LLM reasoning, offering a framework for enhanced performance through self-evolved deep thinking.

How It Works

The core innovation is "deep thinking" via Monte Carlo Tree Search (MCTS). An SLM acts as a policy model, guiding a test-time search. This search is further refined by an SLM-based process reward model, which evaluates the quality of reasoning steps. This approach allows SLMs to explore multiple reasoning paths and self-correct, leading to more robust mathematical problem-solving.

Quick Start & Requirements

  • Environment: Conda with Python 3.11.
  • Hardware: A100 80G GPU with CUDA 12.4 is recommended.
  • Installation: pip install -r requirements.txt. Flash-attention 2 is optional.
  • Evaluation Toolkit: Requires cloning and installing the MARIO_EVAL toolkit.
  • CUDA Compatibility: A workaround is provided for CUDA versions lower than 12.4.
  • Docs: Paper

Highlighted Details

  • Demonstrates SLMs rivaling or surpassing OpenAI's o1-mini on math reasoning.
  • Utilizes Monte Carlo Tree Search (MCTS) for deep thinking and self-evolution.
  • Employs an SLM-based process reward model to guide search.
  • Provides scripts for data generation, SFT/RM training, and inference/evaluation.

Maintenance & Community

  • Actively hiring interns for LLM reasoning research.
  • Code and paper are open-sourced.
  • Prior work on "Mutual Reasoning" is available on a separate branch.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.

Limitations & Caveats

  • Requires significant GPU resources, particularly for training and extensive MCTS.
  • The setup and training scripts are complex, with specific hardware and CUDA version recommendations.
  • The license is not specified, which may impact commercial use or integration into closed-source projects.
Health Check
Last commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
3
Star History
86 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.