MobileLLM  by facebookresearch

Sub-billion parameter LLM training code for on-device use

Created 1 year ago
1,350 stars

Top 29.8% on SourcePulse

GitHubView on GitHub
Project Summary

MobileLLM provides training code for sub-billion parameter language models optimized for on-device applications. It targets researchers and developers seeking efficient LLMs with competitive performance on commonsense reasoning tasks, offering significant accuracy improvements over existing models in its size class.

How It Works

MobileLLM integrates several architectural and training optimizations: SwiGLU activation, deep and thin architectures, embedding sharing, and grouped-query attention. This combination aims to maximize performance within a constrained parameter budget, leading to notable accuracy gains on zero-shot commonsense reasoning benchmarks compared to similarly sized models.

Quick Start & Requirements

  • Install: pip install -r requirement.txt
  • Prerequisites: Python 3.9, PyTorch >= 2.0. Requires data preprocessed into a specific directory structure (see README). Training is designed for multi-GPU setups (e.g., 1x8 GPUs per node).
  • Resources: Training a 1T token model on 32 NVIDIA A100 80GB GPUs takes approximately 3-18 days depending on model size (125M to 1.5B parameters).
  • Links: HuggingFace Models, Data Prep

Highlighted Details

  • Achieves 2.7%/4.3% accuracy improvement over SoTA for 125M/350M models on zero-shot commonsense reasoning.
  • Design philosophy scales to larger models (600M, 1B, 1.5B) with SoTA results.
  • Training script pretrain.sh supports multi-node configurations via torchrun.
  • Evaluation script eval.sh is provided.

Maintenance & Community

Licensing & Compatibility

  • License: FAIR NC. This license is not a standard OSI-approved license and may have restrictions on commercial use. Further review is recommended.

Limitations & Caveats

The FAIR NC license may impose restrictions on commercial use. The training setup requires significant multi-GPU resources and specific data preprocessing.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
44 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.