MathCoder  by mathllm

LLM family for enhanced mathematical reasoning via code integration

created 1 year ago
302 stars

Top 89.3% on sourcepulse

GitHubView on GitHub
Project Summary

MathCoder is a family of LLMs and LMMs designed to enhance mathematical reasoning by integrating code generation and execution capabilities. It targets researchers and developers working on AI for mathematics, offering improved performance on complex math benchmarks.

How It Works

MathCoder models are fine-tuned using the MathCodeInstruct dataset, which interleaves natural language, code, and execution results. This approach allows the models to generate code-based solutions for mathematical problems, mirroring the functionality of tools like GPT-4's Code Interpreter. The models are trained to reason with code, execute it, and use the output for further reasoning, leading to enhanced problem-solving accuracy.

Quick Start & Requirements

  • Deployment: Uses Text Generation Inference (TGI) for serving models.
  • Inference: Requires inference.py script and TGI API endpoint.
  • Evaluation: Requires evaluate.py script.
  • Dependencies: Python, TGI. Specific hardware requirements (GPU, CUDA) are not explicitly detailed but are implied for LLM deployment.
  • Resources: Model weights are available on Hugging Face.

Highlighted Details

  • Achieves 87.7% accuracy on GSM8K and 55.7% on MATH with MathGenie.
  • Outperforms ChatGPT-3.5, PaLM-2, and GPT-4 on GSM8K and MATH benchmarks.
  • Models are based on Llama-2 and Code Llama architectures (7B, 13B, 34B variants).
  • MathCoder and CSV accepted at ICLR 2024.

Maintenance & Community

  • Models and datasets are released on Hugging Face.
  • Paper available at arXiv:2310.03731.
  • Work featured by Aran Komatsuzaki.

Licensing & Compatibility

  • The README does not explicitly state the license for the models or code. It mentions releasing datasets and models, implying open availability but without a specific license.

Limitations & Caveats

The README does not specify any limitations or caveats regarding the models' performance, potential biases, or unsupported mathematical domains. The licensing status is also unclear, which may impact commercial use.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
40 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Woosuk Kwon Woosuk Kwon(Author of vLLM), and
11 more.

WizardLM by nlpxucan

0.1%
9k
LLMs built using Evol-Instruct for complex instruction following
created 2 years ago
updated 1 month ago
Feedback? Help us improve.