MAmmoTH by TIGER-AI-Lab

LLM for math problem-solving, targeting generalizability

Created 2 years ago

378 stars

Top 75.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

MAmmoTH provides open-source large language models (LLMs) specialized for mathematical reasoning, built upon instruction tuning with a novel hybrid Chain-of-Thought (CoT) and Program-of-Thought (PoT) approach. It targets researchers and developers aiming to enhance LLM performance on diverse mathematical tasks, offering models based on Llama-2, Code Llama, and Mistral architectures.

How It Works

MAmmoTH models are trained on the MathInstruct dataset, which emphasizes a hybrid CoT/PoT rationale strategy. This approach allows the models to generate executable code (PoT) for problem-solving when feasible, falling back to CoT reasoning otherwise. This hybrid decoding method aims to improve accuracy and robustness across a wide range of mathematical problems.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python, Hugging Face transformers, datasets, vllm (for optimized inference). GPU is recommended for training and efficient inference.
Demo: Project Page
Models: Available on Hugging Face (MathInstruct Dataset).

Highlighted Details

Offers models ranging from 7B to 70B parameters, based on Llama-2, Code Llama, and Mistral.
Achieves strong performance on benchmarks like GSM8K (up to 75.0) and MATH (up to 40.0) with the 7B Mistral variant using hybrid decoding.
Supports both CoT and PoT generation, with a hybrid decoding strategy that prioritizes PoT and falls back to CoT.
Includes code for fine-tuning, inference, and large-scale evaluation on various math datasets.

Maintenance & Community

The project is associated with TIGER-AI-Lab. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The project's dataset licenses vary, including MIT, Apache 2.0, and non-commercial licenses for some subsets. Commercial use may be restricted depending on the specific dataset components utilized.

Limitations & Caveats

The README notes that some dataset subsets have non-listed or non-commercial licenses, requiring careful review for commercial applications. Performance can vary significantly based on the chosen base model and decoding strategy.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days