MobileLLM by facebookresearch

Sub-billion parameter LLM training code for on-device use

Created 1 year ago

1,402 stars

Top 28.7% on SourcePulse

View on GitHub

7 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Luis Capelo

Cofounder of Lightning AI

Elvis Saravia

Founder of DAIR.AI

Yaowei Zheng

Author of LLaMA-Factory

and 3 more!

Project Summary

MobileLLM provides training code for sub-billion parameter language models optimized for on-device applications. It targets researchers and developers seeking efficient LLMs with competitive performance on commonsense reasoning tasks, offering significant accuracy improvements over existing models in its size class.

How It Works

MobileLLM integrates several architectural and training optimizations: SwiGLU activation, deep and thin architectures, embedding sharing, and grouped-query attention. This combination aims to maximize performance within a constrained parameter budget, leading to notable accuracy gains on zero-shot commonsense reasoning benchmarks compared to similarly sized models.

Quick Start & Requirements

Install: pip install -r requirement.txt
Prerequisites: Python 3.9, PyTorch >= 2.0. Requires data preprocessed into a specific directory structure (see README). Training is designed for multi-GPU setups (e.g., 1x8 GPUs per node).
Resources: Training a 1T token model on 32 NVIDIA A100 80GB GPUs takes approximately 3-18 days depending on model size (125M to 1.5B parameters).
Links: HuggingFace Models, Data Prep

Highlighted Details

Achieves 2.7%/4.3% accuracy improvement over SoTA for 125M/350M models on zero-shot commonsense reasoning.
Design philosophy scales to larger models (600M, 1B, 1.5B) with SoTA results.
Training script pretrain.sh supports multi-node configurations via torchrun.
Evaluation script eval.sh is provided.

Maintenance & Community

Developed by Meta AI researchers (Zechun Liu, Changsheng Zhao).
Contact: zechunliu@meta.com, cszhao@meta.com.
Related projects: SpinQuant, LLM-QAT.

Licensing & Compatibility

License: FAIR NC. This license is not a standard OSI-approved license and may have restrictions on commercial use. Further review is recommended.

Limitations & Caveats

The FAIR NC license may impose restrictions on commercial use. The training setup requires significant multi-GPU resources and specific data preprocessing.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days