Sub-billion parameter LLM training code for on-device use
Top 31.1% on sourcepulse
MobileLLM provides training code for sub-billion parameter language models optimized for on-device applications. It targets researchers and developers seeking efficient LLMs with competitive performance on commonsense reasoning tasks, offering significant accuracy improvements over existing models in its size class.
How It Works
MobileLLM integrates several architectural and training optimizations: SwiGLU activation, deep and thin architectures, embedding sharing, and grouped-query attention. This combination aims to maximize performance within a constrained parameter budget, leading to notable accuracy gains on zero-shot commonsense reasoning benchmarks compared to similarly sized models.
Quick Start & Requirements
pip install -r requirement.txt
Highlighted Details
pretrain.sh
supports multi-node configurations via torchrun
.eval.sh
is provided.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The FAIR NC license may impose restrictions on commercial use. The training setup requires significant multi-GPU resources and specific data preprocessing.
3 months ago
1+ week