nanoT5  by PiotrNawrot

PyTorch code for T5 pre-training and fine-tuning on a single GPU

Created 2 years ago
1,010 stars

Top 37.0% on SourcePulse

GitHubView on GitHub
Project Summary

nanoT5 provides a PyTorch-based framework for pre-training and fine-tuning T5-style encoder-decoder language models with limited computational resources. It targets researchers and practitioners who need an accessible template for custom T5 model development, aiming to democratize LLM pre-training by demonstrating feasibility on a single A100 GPU within 24 hours.

How It Works

The project optimizes the T5 training pipeline, leveraging HuggingFace Accelerate for distributed training primitives, experiment tracking with Neptune.ai, and hyperparameter management via Hydra. A key innovation is an augmented AdamW optimizer with RMS scaling, which stabilizes training and improves performance compared to the original Adafactor optimizer, especially when paired with a cosine learning rate scheduler. The framework also includes on-the-fly dataset preprocessing for C4 and supports mixed-precision training and PyTorch 2.0 compilation for efficiency.

Quick Start & Requirements

  • Install: pip install -r requirements.txt after cloning the repository.
  • Prerequisites: Python 3.8+, PyTorch 2.0 recommended for torch.compile. A single A100 GPU is recommended for achieving the reported <24 hour pre-training times.
  • Setup: Requires downloading the C4 dataset (handled on-the-fly) and optionally the Super-Natural Instructions dataset for fine-tuning.
  • Links: GitHub, Paper

Highlighted Details

  • Achieves 40.7 RougeL on Super-Natural Instructions with 16 hours of pre-training on a single A100, closely matching the original T5-base-v1.1 performance.
  • Demonstrates that AdamW with RMS scaling and a cosine LR schedule outperforms Adafactor with an inverse-square-root schedule for T5 pre-training.
  • Offers a simplified T5 model implementation for educational purposes.
  • Supports efficient training through mixed-precision (TF32, BF16) and torch.compile.

Maintenance & Community

The project is maintained by Piotr Nawrot. Community interaction is encouraged via GitHub Issues.

Licensing & Compatibility

The repository is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The reported performance is achieved on an A100 GPU; performance on other hardware may vary. While the project aims for simplicity, advanced parallelism techniques like tensor or pipeline parallelism are not implemented, as they are deemed unnecessary for small-scale training and add significant complexity. FP16 precision experiments diverged, limiting precision options.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI) and Daniel Han Daniel Han(Cofounder of Unsloth).

cifar10-airbench by KellerJordan

1.0%
295
Fast CIFAR-10 training benchmarks
Created 1 year ago
Updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
4 more.

Sophia by Liuhong99

0.1%
970
Optimizer for language model pre-training (research paper)
Created 2 years ago
Updated 1 year ago
Starred by Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), Albert Gu Albert Gu(Cofounder of Cartesia; Professor at CMU), and
2 more.

Muon by KellerJordan

1.7%
2k
Optimizer for neural network hidden layers
Created 10 months ago
Updated 2 months ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.3%
9k
PyTorch training helper for distributed execution
Created 4 years ago
Updated 1 day ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
28k
LLM training in pure C/CUDA, no PyTorch needed
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.