PyTorch code for T5 pre-training and fine-tuning on a single GPU
Top 37.8% on sourcepulse
nanoT5 provides a PyTorch-based framework for pre-training and fine-tuning T5-style encoder-decoder language models with limited computational resources. It targets researchers and practitioners who need an accessible template for custom T5 model development, aiming to democratize LLM pre-training by demonstrating feasibility on a single A100 GPU within 24 hours.
How It Works
The project optimizes the T5 training pipeline, leveraging HuggingFace Accelerate for distributed training primitives, experiment tracking with Neptune.ai, and hyperparameter management via Hydra. A key innovation is an augmented AdamW optimizer with RMS scaling, which stabilizes training and improves performance compared to the original Adafactor optimizer, especially when paired with a cosine learning rate scheduler. The framework also includes on-the-fly dataset preprocessing for C4 and supports mixed-precision training and PyTorch 2.0 compilation for efficiency.
Quick Start & Requirements
pip install -r requirements.txt
after cloning the repository.torch.compile
. A single A100 GPU is recommended for achieving the reported <24 hour pre-training times.Highlighted Details
torch.compile
.Maintenance & Community
The project is maintained by Piotr Nawrot. Community interaction is encouraged via GitHub Issues.
Licensing & Compatibility
The repository is released under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The reported performance is achieved on an A100 GPU; performance on other hardware may vary. While the project aims for simplicity, advanced parallelism techniques like tensor or pipeline parallelism are not implemented, as they are deemed unnecessary for small-scale training and add significant complexity. FP16 precision experiments diverged, limiting precision options.
11 months ago
1 day