MAmmoTH  by TIGER-AI-Lab

LLM for math problem-solving, targeting generalizability

created 1 year ago
376 stars

Top 76.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MAmmoTH provides open-source large language models (LLMs) specialized for mathematical reasoning, built upon instruction tuning with a novel hybrid Chain-of-Thought (CoT) and Program-of-Thought (PoT) approach. It targets researchers and developers aiming to enhance LLM performance on diverse mathematical tasks, offering models based on Llama-2, Code Llama, and Mistral architectures.

How It Works

MAmmoTH models are trained on the MathInstruct dataset, which emphasizes a hybrid CoT/PoT rationale strategy. This approach allows the models to generate executable code (PoT) for problem-solving when feasible, falling back to CoT reasoning otherwise. This hybrid decoding method aims to improve accuracy and robustness across a wide range of mathematical problems.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, Hugging Face transformers, datasets, vllm (for optimized inference). GPU is recommended for training and efficient inference.
  • Demo: Project Page
  • Models: Available on Hugging Face (MathInstruct Dataset).

Highlighted Details

  • Offers models ranging from 7B to 70B parameters, based on Llama-2, Code Llama, and Mistral.
  • Achieves strong performance on benchmarks like GSM8K (up to 75.0) and MATH (up to 40.0) with the 7B Mistral variant using hybrid decoding.
  • Supports both CoT and PoT generation, with a hybrid decoding strategy that prioritizes PoT and falls back to CoT.
  • Includes code for fine-tuning, inference, and large-scale evaluation on various math datasets.

Maintenance & Community

The project is associated with TIGER-AI-Lab. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The project's dataset licenses vary, including MIT, Apache 2.0, and non-commercial licenses for some subsets. Commercial use may be restricted depending on the specific dataset components utilized.

Limitations & Caveats

The README notes that some dataset subsets have non-listed or non-commercial licenses, requiring careful review for commercial applications. Performance can vary significantly based on the chosen base model and decoding strategy.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.