Ultra-efficient LLMs for end devices, achieving 5x+ speedup
Top 6.5% on sourcepulse
MiniCPM offers a suite of highly efficient, small-parameter language models designed for deployment on end devices. It addresses the need for powerful yet resource-constrained AI capabilities, targeting developers and researchers seeking performant LLMs for edge computing and applications requiring low latency. The models demonstrate competitive performance against larger counterparts, enabling advanced AI features on consumer hardware.
How It Works
MiniCPM models leverage a novel architecture and training strategies to achieve high efficiency. Key innovations include the development of the MiniCPM-S variant, which achieves significant FLOP reduction through sparse FFN layers (up to 87.89% sparsity), and the introduction of LLMxMapReduce for theoretically infinite context length processing. These techniques allow the models to maintain strong performance while drastically reducing computational requirements.
Quick Start & Requirements
pip install transformers accelerate torch
and use AutoModelForCausalLM.from_pretrained('openbmb/MiniCPM3-4B', torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
pip install sglang
(from source recommended for latest optimizations).pip install vllm>=0.6.2
.make
to build.Highlighted Details
Maintenance & Community
The project is actively developed by OpenBMB, with significant contributions and integrations noted with SGLang and llama.cpp. Community engagement is encouraged via Discord and WeChat groups.
Licensing & Compatibility
Code is licensed under Apache-2.0. Model weights require adherence to the MiniCPM Model Commercial License Agreement, with free commercial use granted after registration via a questionnaire. Academic research use is fully open.
Limitations & Caveats
While MiniCPM models are highly efficient, performance can vary based on specific hardware and inference frameworks. The README notes that for non-MiniCPM models, vLLM version 0.2.7 is used, while MiniCPM implementations are based on vLLM 0.2.2, suggesting potential compatibility nuances.
3 weeks ago
1 week