MiMo  by XiaomiMiMo

LLM for reasoning, pre-trained and post-trained for math/code tasks

Created 11 months ago
2,023 stars

Top 21.4% on SourcePulse

GitHubView on GitHub
Project Summary

MiMo is a series of 7B parameter language models specifically designed to excel at reasoning tasks, including mathematics and code generation. It targets researchers and developers seeking high-performance models that can compete with much larger architectures, offering a pre-trained base model and fine-tuned versions for enhanced reasoning capabilities.

How It Works

MiMo employs a dual-pronged approach: optimized pre-training and a novel post-training recipe. The base model is pre-trained on approximately 25 trillion tokens with a focus on reasoning patterns, incorporating Multiple-Token Prediction (MTP) for improved performance and inference speed. The post-training phase utilizes a curated dataset of 130K math and code problems, employing rule-based accuracy rewards and a test difficulty-driven reward system to mitigate sparse rewards and stabilize RL training.

Quick Start & Requirements

  • Installation: Inference is officially supported via a fork of vLLM (0.7.3). Hugging Face Transformers can also be used.
  • Prerequisites: Python, vLLM (forked version recommended), Hugging Face Transformers.
  • Resources: Requires significant VRAM for 7B models; specific requirements depend on inference setup.
  • Links: HuggingFace Models, Technical Report (Note: Link is illustrative, actual report link may vary).

Highlighted Details

  • MiMo-7B-RL surpasses larger 32B models on several reasoning benchmarks.
  • Achieves performance comparable to OpenAI's o1-mini on math and code tasks.
  • Features a "Seamless Rollout Engine" for accelerated RL training (2.29x faster).
  • Incorporates Multiple-Token Prediction (MTP) for enhanced inference.

Maintenance & Community

  • Developed by the Xiaomi LLM-Core Team.
  • Contact: mimo@xiaomi.com or GitHub issues.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

  • Evaluation benchmarks were conducted with temperature=0.6, and specific benchmarks used averaged scores over multiple repetitions. Compatibility with inference engines other than the recommended vLLM fork has not been verified.
Health Check
Last Commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
89 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.0%
4k
RL recipe for reasoning ability in models
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.