MiMo-V2-Flash by XiaomiMiMo

Efficient MoE foundation model for reasoning, coding, and agents

Created 2 months ago

1,057 stars

Top 35.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

MiMo-V2-Flash is an efficient Mixture-of-Experts (MoE) foundation model designed for high-speed reasoning, coding, and agentic workflows. It targets developers and researchers seeking state-of-the-art performance with significantly reduced inference costs, balancing long-context modeling and inference efficiency for complex task execution.

How It Works

This 309B total parameter (15B active) MoE model uses a Hybrid Attention Architecture, interleaving Sliding Window Attention (SWA) with Global Attention (GA) in a 5:1 ratio with a 128-token window. This design drastically cuts KV-cache storage by ~6x while preserving long-context performance via a learnable attention sink bias. Additionally, a lightweight Multi-Token Prediction (MTP) module (0.33B params/block) triples output speed during inference and aids RL training. Post-training employs Multi-Teacher On-Policy Distillation (MOPD) and scaled agentic RL for enhanced reasoning and agentic capabilities.

Quick Start & Requirements

Installation requires a specific SGLang version: pip install sglang==0.5.6.post2.dev8005+pr.15207.g39d5bd57a --index-url https://sgl-project.github.io/whl/pr/ --extra-index-url https://pypi.org/simple. Server launch uses SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server ... with distributed inference parameters. Prerequisites include FP8 mixed precision support and a compatible GPU setup. The model supports up to a 256k context length. Links to HuggingFace and technical reports are provided.

Highlighted Details

256k context length with 309B total / 15B active parameters.
Hybrid Attention reduces KV-cache by ~6x.
Multi-Token Prediction (MTP) triples generation speed.
Strong performance on SWE-Bench and complex reasoning tasks post-training.
Trained on 27T tokens using FP8 mixed precision.

Maintenance & Community

Contact is available via email (mimo@xiaomi.com), a WeChat group, and GitHub issues. Community engagement is facilitated through the GitHub repository.

Licensing & Compatibility

The README does not specify a software license. Users must verify licensing terms for adoption, especially for commercial or closed-source integration.

Limitations & Caveats

Benchmark comparisons show MiMo-V2-Flash-Base underperforming some competitors on specific general, coding, and multilingual tasks. The model has a knowledge cutoff date of December 2024. The quick start guide mandates a specific, potentially unstable, pre-release version of SGLang.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

39 stars in the last 30 days