nanotron  by huggingface

Minimalistic library for large language model pretraining

Created 2 years ago
2,212 stars

Top 20.5% on SourcePulse

GitHubView on GitHub
Project Summary

Nanotron is a library for pretraining transformer models, offering a simple, performant, and scalable API for custom datasets. It targets researchers and engineers building large language models, enabling efficient training through advanced parallelism techniques.

How It Works

Nanotron implements 3D parallelism (Data Parallelism, Tensor Parallelism, Pipeline Parallelism) to distribute model training across multiple GPUs and nodes. It supports expert parallelism for Mixture-of-Experts (MoE) models and includes optimized scheduling (AFAB, 1F1B) for pipeline parallelism. The library provides explicit APIs for TP and PP, facilitating debugging and customization, along with ZeRO-1 optimizer and FP32 gradient accumulation for memory efficiency.

Quick Start & Requirements

  • Installation:
    uv venv nanotron --python 3.11 && source nanotron/bin/activate
    uv pip install torch --index-url https://download.pytorch.org/whl/cu124
    uv pip install -e .
    uv pip install datasets transformers datatrove[io] numba wandb ninja triton "flash-attn>=2.5.0" --no-build-isolation
    huggingface-cli login
    wandb login
    git-lfs --version
    
  • Prerequisites: Python 3.11, PyTorch with CUDA 12.4, Git LFS.
  • Resources: Requires multiple GPUs (e.g., 8 x H100s for the tiny Llama example).
  • Docs: Ultrascale Playbook, Your First Training

Highlighted Details

  • Supports 3D parallelism (DP+TP+PP) and expert parallelism for MoEs.
  • Includes AFAB and 1F1B schedules for pipeline parallelism.
  • Features ZeRO-1 optimizer, FP32 gradient accumulation, and parameter tying/sharding.
  • Offers spectral µTransfer parametrization for scaling neural networks.

Maintenance & Community

  • Actively developed by Hugging Face.
  • Examples cover custom dataloaders, Mamba, MoE, and µTransfer.
  • Roadmap includes FP8 training, ZeRO-3, torch.compile, and ring attention.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

  • FP8 training and ZeRO-3 (FSDP) are on the roadmap, not yet implemented.
  • torch.compile support is also planned for future releases.
Health Check
Last Commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
76 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

3.4%
1k
Framework for scaling multimodal model training across accelerators
Created 5 months ago
Updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
20 more.

alpa by alpa-projects

0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Created 4 years ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 20 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.2%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 days ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 14 hours ago
Feedback? Help us improve.