MathCoder  by mathllm

LLM family for enhanced mathematical reasoning via code integration

Created 2 years ago
311 stars

Top 86.5% on SourcePulse

GitHubView on GitHub
Project Summary

MathCoder is a family of LLMs and LMMs designed to enhance mathematical reasoning by integrating code generation and execution capabilities. It targets researchers and developers working on AI for mathematics, offering improved performance on complex math benchmarks.

How It Works

MathCoder models are fine-tuned using the MathCodeInstruct dataset, which interleaves natural language, code, and execution results. This approach allows the models to generate code-based solutions for mathematical problems, mirroring the functionality of tools like GPT-4's Code Interpreter. The models are trained to reason with code, execute it, and use the output for further reasoning, leading to enhanced problem-solving accuracy.

Quick Start & Requirements

  • Deployment: Uses Text Generation Inference (TGI) for serving models.
  • Inference: Requires inference.py script and TGI API endpoint.
  • Evaluation: Requires evaluate.py script.
  • Dependencies: Python, TGI. Specific hardware requirements (GPU, CUDA) are not explicitly detailed but are implied for LLM deployment.
  • Resources: Model weights are available on Hugging Face.

Highlighted Details

  • Achieves 87.7% accuracy on GSM8K and 55.7% on MATH with MathGenie.
  • Outperforms ChatGPT-3.5, PaLM-2, and GPT-4 on GSM8K and MATH benchmarks.
  • Models are based on Llama-2 and Code Llama architectures (7B, 13B, 34B variants).
  • MathCoder and CSV accepted at ICLR 2024.

Maintenance & Community

  • Models and datasets are released on Hugging Face.
  • Paper available at arXiv:2310.03731.
  • Work featured by Aran Komatsuzaki.

Licensing & Compatibility

  • The README does not explicitly state the license for the models or code. It mentions releasing datasets and models, implying open availability but without a specific license.

Limitations & Caveats

The README does not specify any limitations or caveats regarding the models' performance, potential biases, or unsupported mathematical domains. The licensing status is also unclear, which may impact commercial use.

Health Check
Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
18 more.

WizardLM by nlpxucan

0.0%
9k
LLMs built using Evol-Instruct for complex instruction following
Created 2 years ago
Updated 3 months ago
Feedback? Help us improve.