nanotron  by huggingface

Minimalistic library for large language model pretraining

created 1 year ago
2,078 stars

Top 22.0% on sourcepulse

GitHubView on GitHub
Project Summary

Nanotron is a library for pretraining transformer models, offering a simple, performant, and scalable API for custom datasets. It targets researchers and engineers building large language models, enabling efficient training through advanced parallelism techniques.

How It Works

Nanotron implements 3D parallelism (Data Parallelism, Tensor Parallelism, Pipeline Parallelism) to distribute model training across multiple GPUs and nodes. It supports expert parallelism for Mixture-of-Experts (MoE) models and includes optimized scheduling (AFAB, 1F1B) for pipeline parallelism. The library provides explicit APIs for TP and PP, facilitating debugging and customization, along with ZeRO-1 optimizer and FP32 gradient accumulation for memory efficiency.

Quick Start & Requirements

  • Installation:
    uv venv nanotron --python 3.11 && source nanotron/bin/activate
    uv pip install torch --index-url https://download.pytorch.org/whl/cu124
    uv pip install -e .
    uv pip install datasets transformers datatrove[io] numba wandb ninja triton "flash-attn>=2.5.0" --no-build-isolation
    huggingface-cli login
    wandb login
    git-lfs --version
    
  • Prerequisites: Python 3.11, PyTorch with CUDA 12.4, Git LFS.
  • Resources: Requires multiple GPUs (e.g., 8 x H100s for the tiny Llama example).
  • Docs: Ultrascale Playbook, Your First Training

Highlighted Details

  • Supports 3D parallelism (DP+TP+PP) and expert parallelism for MoEs.
  • Includes AFAB and 1F1B schedules for pipeline parallelism.
  • Features ZeRO-1 optimizer, FP32 gradient accumulation, and parameter tying/sharding.
  • Offers spectral µTransfer parametrization for scaling neural networks.

Maintenance & Community

  • Actively developed by Hugging Face.
  • Examples cover custom dataloaders, Mamba, MoE, and µTransfer.
  • Roadmap includes FP8 training, ZeRO-3, torch.compile, and ring attention.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

  • FP8 training and ZeRO-3 (FSDP) are on the roadmap, not yet implemented.
  • torch.compile support is also planned for future releases.
Health Check
Last commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
2
Star History
271 stars in the last 90 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
created 3 years ago
updated 2 years ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Zhiqiang Xie Zhiqiang Xie(Author of SGLang).

veScale by volcengine

0.1%
839
PyTorch-native framework for LLM training
created 1 year ago
updated 3 weeks ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Zhuohan Li Zhuohan Li(Author of vLLM), and
6 more.

torchtitan by pytorch

0.9%
4k
PyTorch platform for generative AI model training research
created 1 year ago
updated 22 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
6 more.

gpt-neox by EleutherAI

0.1%
7k
Framework for training large-scale autoregressive language models
created 4 years ago
updated 1 week ago
Feedback? Help us improve.