Minimalistic library for large language model pretraining
Top 22.0% on sourcepulse
Nanotron is a library for pretraining transformer models, offering a simple, performant, and scalable API for custom datasets. It targets researchers and engineers building large language models, enabling efficient training through advanced parallelism techniques.
How It Works
Nanotron implements 3D parallelism (Data Parallelism, Tensor Parallelism, Pipeline Parallelism) to distribute model training across multiple GPUs and nodes. It supports expert parallelism for Mixture-of-Experts (MoE) models and includes optimized scheduling (AFAB, 1F1B) for pipeline parallelism. The library provides explicit APIs for TP and PP, facilitating debugging and customization, along with ZeRO-1 optimizer and FP32 gradient accumulation for memory efficiency.
Quick Start & Requirements
uv venv nanotron --python 3.11 && source nanotron/bin/activate
uv pip install torch --index-url https://download.pytorch.org/whl/cu124
uv pip install -e .
uv pip install datasets transformers datatrove[io] numba wandb ninja triton "flash-attn>=2.5.0" --no-build-isolation
huggingface-cli login
wandb login
git-lfs --version
Highlighted Details
Maintenance & Community
torch.compile
, and ring attention.Licensing & Compatibility
Limitations & Caveats
torch.compile
support is also planned for future releases.3 weeks ago
1 week