torchtitan  by pytorch

PyTorch platform for generative AI model training research

Created 1 year ago
4,639 stars

Top 10.7% on SourcePulse

GitHubView on GitHub
Project Summary

TorchTitan is a PyTorch-native platform for large-scale generative AI model training, targeting researchers and developers seeking a flexible, minimal, and extensible framework. It aims to accelerate innovation by simplifying the implementation of advanced distributed training techniques for models like LLMs and diffusion models.

How It Works

TorchTitan leverages PyTorch's native scaling features, offering composable multi-dimensional parallelisms (Tensor, Pipeline, Context) and advanced techniques like FSDP2 with per-parameter sharding, activation checkpointing, and Float8 support. Its design prioritizes ease of understanding, minimal code modification for parallelism, and a clean, reusable component-based architecture.

Quick Start & Requirements

  • Install via pip install -r requirements.txt and pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu126 --force-reinstall.
  • Requires PyTorch nightly builds (CUDA 12.6 recommended). AMD GPU support via ROCm 6.3 is available.
  • Tokenizer download script provided for Llama 3.1 models.
  • Official documentation and a quick-start guide are available.

Highlighted Details

  • Supports multi-dimensional composable parallelisms: FSDP2, Tensor Parallel (async TP), Pipeline Parallel, Context Parallel.
  • Features selective/full activation checkpointing, distributed checkpointing, and Float8 support.
  • Integrates with torch.compile and offers interoperable checkpoints loadable by torchtune.
  • Includes comprehensive logging (Tensorboard/W&B), debugging tools (profiling), and helper scripts for model conversion and memory estimation.

Maintenance & Community

  • Active development with recent updates including Llama 4 support and diffusion model experiments.
  • Presentations at PyTorch Conference 2024 and an upcoming ICLR 2025 poster.
  • Community discussion via the PyTorch forum.

Licensing & Compatibility

  • Licensed under BSD 3-Clause.
  • Users must adhere to separate licenses for third-party data and models.

Limitations & Caveats

The project is in a pre-release state, indicating potential for breaking changes and ongoing development. While showcasing Llama 3.1 training up to 512 GPUs, broader model support is experimental.

Health Check
Last Commit

5 hours ago

Responsiveness

1 day

Pull Requests (30d)
173
Issues (30d)
25
Star History
152 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

1.6%
1k
Framework for scaling multimodal model training across accelerators
Created 7 months ago
Updated 9 hours ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
7 more.

lingua by facebookresearch

0.0%
5k
LLM research codebase for training and inference
Created 1 year ago
Updated 3 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
27 more.

ColossalAI by hpcaitech

0.0%
41k
AI system for large-scale parallel training
Created 4 years ago
Updated 3 weeks ago
Feedback? Help us improve.