nanoT5  by PiotrNawrot

PyTorch code for T5 pre-training and fine-tuning on a single GPU

created 2 years ago
1,006 stars

Top 37.8% on sourcepulse

GitHubView on GitHub
Project Summary

nanoT5 provides a PyTorch-based framework for pre-training and fine-tuning T5-style encoder-decoder language models with limited computational resources. It targets researchers and practitioners who need an accessible template for custom T5 model development, aiming to democratize LLM pre-training by demonstrating feasibility on a single A100 GPU within 24 hours.

How It Works

The project optimizes the T5 training pipeline, leveraging HuggingFace Accelerate for distributed training primitives, experiment tracking with Neptune.ai, and hyperparameter management via Hydra. A key innovation is an augmented AdamW optimizer with RMS scaling, which stabilizes training and improves performance compared to the original Adafactor optimizer, especially when paired with a cosine learning rate scheduler. The framework also includes on-the-fly dataset preprocessing for C4 and supports mixed-precision training and PyTorch 2.0 compilation for efficiency.

Quick Start & Requirements

  • Install: pip install -r requirements.txt after cloning the repository.
  • Prerequisites: Python 3.8+, PyTorch 2.0 recommended for torch.compile. A single A100 GPU is recommended for achieving the reported <24 hour pre-training times.
  • Setup: Requires downloading the C4 dataset (handled on-the-fly) and optionally the Super-Natural Instructions dataset for fine-tuning.
  • Links: GitHub, Paper

Highlighted Details

  • Achieves 40.7 RougeL on Super-Natural Instructions with 16 hours of pre-training on a single A100, closely matching the original T5-base-v1.1 performance.
  • Demonstrates that AdamW with RMS scaling and a cosine LR schedule outperforms Adafactor with an inverse-square-root schedule for T5 pre-training.
  • Offers a simplified T5 model implementation for educational purposes.
  • Supports efficient training through mixed-precision (TF32, BF16) and torch.compile.

Maintenance & Community

The project is maintained by Piotr Nawrot. Community interaction is encouraged via GitHub Issues.

Licensing & Compatibility

The repository is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The reported performance is achieved on an A100 GPU; performance on other hardware may vary. While the project aims for simplicity, advanced parallelism techniques like tensor or pipeline parallelism are not implemented, as they are deemed unnecessary for small-scale training and add significant complexity. FP16 precision experiments diverged, limiting precision options.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
258
Efficiently train foundation models with PyTorch
created 1 year ago
updated 1 week ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
3 more.

modded-nanogpt by KellerJordan

2.6%
3k
Language model training speedrun on 8x H100 GPUs
created 1 year ago
updated 2 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Zhuohan Li Zhuohan Li(Author of vLLM), and
6 more.

torchtitan by pytorch

0.9%
4k
PyTorch platform for generative AI model training research
created 1 year ago
updated 18 hours ago
Feedback? Help us improve.