LIMO offers a novel approach to mathematical reasoning in large language models, demonstrating that high-quality, curated data can significantly outperform massive datasets. It targets researchers and developers aiming to improve LLM performance on complex reasoning tasks with greater data efficiency.
How It Works
LIMO challenges the "more data is always better" paradigm by using a small, high-quality dataset (817 samples) to achieve state-of-the-art results on mathematical reasoning benchmarks. This curated approach focuses on data quality over quantity, leading to improved generalization and efficiency.
Quick Start & Requirements
- Hugging Face Transformers:
pip install transformers
- VLLM:
pip install vllm
- Model: Available on Hugging Face (
GAIR/LIMO
).
- Backbone: Qwen2.5-32B-Instruct.
- Compatibility: HF Transformers, VLLM, TensorRT-LLM.
- Quick Start: Python code examples provided for both Transformers and VLLM.
- Training: Utilizes the LLaMA-Factory framework. Requires data preparation and configuration.
- Evaluation: Scripts available for inference and evaluation (rule-based and model-based).
Highlighted Details
- Achieves SOTA on multiple benchmarks (AIME24, MATH500, AMC23, etc.) with only 817 training samples.
- Demonstrates strong generalization capabilities, outperforming models trained on 800k+ samples in some cases.
- Released high-quality datasets and evaluation tools for reproducibility.
- Recent updates include performance on AIME 2025 evaluation (44.6 score with 817 samples).
Maintenance & Community
Licensing & Compatibility
- MIT License. Permissive for commercial use and closed-source linking.
Limitations & Caveats
- The primary model is a 32B parameter model, requiring significant computational resources for inference and training.
- While LIMO shows strong performance on many benchmarks, it notes a slight decrease in performance on Minerva and GPQA compared to previous SOTA.