LLM for reasoning, pre-trained and post-trained for math/code tasks
Top 27.8% on sourcepulse
MiMo is a series of 7B parameter language models specifically designed to excel at reasoning tasks, including mathematics and code generation. It targets researchers and developers seeking high-performance models that can compete with much larger architectures, offering a pre-trained base model and fine-tuned versions for enhanced reasoning capabilities.
How It Works
MiMo employs a dual-pronged approach: optimized pre-training and a novel post-training recipe. The base model is pre-trained on approximately 25 trillion tokens with a focus on reasoning patterns, incorporating Multiple-Token Prediction (MTP) for improved performance and inference speed. The post-training phase utilizes a curated dataset of 130K math and code problems, employing rule-based accuracy rewards and a test difficulty-driven reward system to mitigate sparse rewards and stabilize RL training.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
Inactive