Math question generation for LLM training and evaluation
Top 69.2% on sourcepulse
MetaMath provides open-source models and datasets for improving Large Language Model (LLM) performance on mathematical reasoning tasks. It targets researchers and developers seeking to enhance LLMs' capabilities in solving math problems, offering significant performance gains on benchmarks like GSM8k and MATH.
How It Works
MetaMath employs a data augmentation strategy to generate high-quality mathematical questions, effectively bootstrapping the LLM's learning process. This approach, inspired by existing works like WizardMath and RFT, focuses on creating a diverse and challenging dataset to fine-tune base LLMs. The resulting models demonstrate superior performance compared to other open-source LLMs of similar scales.
Quick Start & Requirements
pip install -r requirements.txt
after cloning the repository.ray
and pyarrow
.datasets
library.vllm
for fast generation.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The data augmentation for MetaMathQA was sourced from ChatGPT 3.5, which might introduce certain biases or limitations inherited from the source model. Specific hardware requirements for training, such as multi-GPU setups, are implied by the provided training script.
1 year ago
1 week