MobileLLM  by facebookresearch

Sub-billion parameter LLM training code for on-device use

created 1 year ago
1,313 stars

Top 31.1% on sourcepulse

GitHubView on GitHub
Project Summary

MobileLLM provides training code for sub-billion parameter language models optimized for on-device applications. It targets researchers and developers seeking efficient LLMs with competitive performance on commonsense reasoning tasks, offering significant accuracy improvements over existing models in its size class.

How It Works

MobileLLM integrates several architectural and training optimizations: SwiGLU activation, deep and thin architectures, embedding sharing, and grouped-query attention. This combination aims to maximize performance within a constrained parameter budget, leading to notable accuracy gains on zero-shot commonsense reasoning benchmarks compared to similarly sized models.

Quick Start & Requirements

  • Install: pip install -r requirement.txt
  • Prerequisites: Python 3.9, PyTorch >= 2.0. Requires data preprocessed into a specific directory structure (see README). Training is designed for multi-GPU setups (e.g., 1x8 GPUs per node).
  • Resources: Training a 1T token model on 32 NVIDIA A100 80GB GPUs takes approximately 3-18 days depending on model size (125M to 1.5B parameters).
  • Links: HuggingFace Models, Data Prep

Highlighted Details

  • Achieves 2.7%/4.3% accuracy improvement over SoTA for 125M/350M models on zero-shot commonsense reasoning.
  • Design philosophy scales to larger models (600M, 1B, 1.5B) with SoTA results.
  • Training script pretrain.sh supports multi-node configurations via torchrun.
  • Evaluation script eval.sh is provided.

Maintenance & Community

Licensing & Compatibility

  • License: FAIR NC. This license is not a standard OSI-approved license and may have restrictions on commercial use. Further review is recommended.

Limitations & Caveats

The FAIR NC license may impose restrictions on commercial use. The training setup requires significant multi-GPU resources and specific data preprocessing.

Health Check
Last commit

3 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
3 more.

modded-nanogpt by KellerJordan

2.6%
3k
Language model training speedrun on 8x H100 GPUs
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.