Discover and explore top open-source AI tools and projects—updated daily.
XiaomiMiMoEfficient MoE foundation model for reasoning, coding, and agents
New!
Top 37.8% on SourcePulse
MiMo-V2-Flash is an efficient Mixture-of-Experts (MoE) foundation model designed for high-speed reasoning, coding, and agentic workflows. It targets developers and researchers seeking state-of-the-art performance with significantly reduced inference costs, balancing long-context modeling and inference efficiency for complex task execution.
How It Works
This 309B total parameter (15B active) MoE model uses a Hybrid Attention Architecture, interleaving Sliding Window Attention (SWA) with Global Attention (GA) in a 5:1 ratio with a 128-token window. This design drastically cuts KV-cache storage by ~6x while preserving long-context performance via a learnable attention sink bias. Additionally, a lightweight Multi-Token Prediction (MTP) module (0.33B params/block) triples output speed during inference and aids RL training. Post-training employs Multi-Teacher On-Policy Distillation (MOPD) and scaled agentic RL for enhanced reasoning and agentic capabilities.
Quick Start & Requirements
Installation requires a specific SGLang version: pip install sglang==0.5.6.post2.dev8005+pr.15207.g39d5bd57a --index-url https://sgl-project.github.io/whl/pr/ --extra-index-url https://pypi.org/simple. Server launch uses SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server ... with distributed inference parameters. Prerequisites include FP8 mixed precision support and a compatible GPU setup. The model supports up to a 256k context length. Links to HuggingFace and technical reports are provided.
Highlighted Details
Maintenance & Community
Contact is available via email (mimo@xiaomi.com), a WeChat group, and GitHub issues. Community engagement is facilitated through the GitHub repository.
Licensing & Compatibility
The README does not specify a software license. Users must verify licensing terms for adoption, especially for commercial or closed-source integration.
Limitations & Caveats
Benchmark comparisons show MiMo-V2-Flash-Base underperforming some competitors on specific general, coding, and multilingual tasks. The model has a knowledge cutoff date of December 2024. The quick start guide mandates a specific, potentially unstable, pre-release version of SGLang.
3 days ago
Inactive
MoonshotAI