LIMO  by GAIR-NLP

Reasoning model using less data

Created 7 months ago
1,018 stars

Top 36.8% on SourcePulse

GitHubView on GitHub
Project Summary

LIMO offers a novel approach to mathematical reasoning in large language models, demonstrating that high-quality, curated data can significantly outperform massive datasets. It targets researchers and developers aiming to improve LLM performance on complex reasoning tasks with greater data efficiency.

How It Works

LIMO challenges the "more data is always better" paradigm by using a small, high-quality dataset (817 samples) to achieve state-of-the-art results on mathematical reasoning benchmarks. This curated approach focuses on data quality over quantity, leading to improved generalization and efficiency.

Quick Start & Requirements

  • Hugging Face Transformers: pip install transformers
  • VLLM: pip install vllm
  • Model: Available on Hugging Face (GAIR/LIMO).
  • Backbone: Qwen2.5-32B-Instruct.
  • Compatibility: HF Transformers, VLLM, TensorRT-LLM.
  • Quick Start: Python code examples provided for both Transformers and VLLM.
  • Training: Utilizes the LLaMA-Factory framework. Requires data preparation and configuration.
  • Evaluation: Scripts available for inference and evaluation (rule-based and model-based).

Highlighted Details

  • Achieves SOTA on multiple benchmarks (AIME24, MATH500, AMC23, etc.) with only 817 training samples.
  • Demonstrates strong generalization capabilities, outperforming models trained on 800k+ samples in some cases.
  • Released high-quality datasets and evaluation tools for reproducibility.
  • Recent updates include performance on AIME 2025 evaluation (44.6 score with 817 samples).

Maintenance & Community

Licensing & Compatibility

  • MIT License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

  • The primary model is a 32B parameter model, requiring significant computational resources for inference and training.
  • While LIMO shows strong performance on many benchmarks, it notes a slight decrease in performance on Minerva and GPQA compared to previous SOTA.
Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), and
7 more.

reasoning-gym by open-thought

1.2%
1k
Procedural dataset generator for reasoning models
Created 7 months ago
Updated 3 days ago
Starred by Michael Han Michael Han(Cofounder of Unsloth), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
19 more.

DeepSeek-R1 by deepseek-ai

0.1%
91k
Reasoning models research paper
Created 8 months ago
Updated 2 months ago
Feedback? Help us improve.