torchtitan  by pytorch

PyTorch platform for generative AI model training research

created 1 year ago
4,143 stars

Top 12.1% on sourcepulse

GitHubView on GitHub
Project Summary

TorchTitan is a PyTorch-native platform for large-scale generative AI model training, targeting researchers and developers seeking a flexible, minimal, and extensible framework. It aims to accelerate innovation by simplifying the implementation of advanced distributed training techniques for models like LLMs and diffusion models.

How It Works

TorchTitan leverages PyTorch's native scaling features, offering composable multi-dimensional parallelisms (Tensor, Pipeline, Context) and advanced techniques like FSDP2 with per-parameter sharding, activation checkpointing, and Float8 support. Its design prioritizes ease of understanding, minimal code modification for parallelism, and a clean, reusable component-based architecture.

Quick Start & Requirements

  • Install via pip install -r requirements.txt and pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu126 --force-reinstall.
  • Requires PyTorch nightly builds (CUDA 12.6 recommended). AMD GPU support via ROCm 6.3 is available.
  • Tokenizer download script provided for Llama 3.1 models.
  • Official documentation and a quick-start guide are available.

Highlighted Details

  • Supports multi-dimensional composable parallelisms: FSDP2, Tensor Parallel (async TP), Pipeline Parallel, Context Parallel.
  • Features selective/full activation checkpointing, distributed checkpointing, and Float8 support.
  • Integrates with torch.compile and offers interoperable checkpoints loadable by torchtune.
  • Includes comprehensive logging (Tensorboard/W&B), debugging tools (profiling), and helper scripts for model conversion and memory estimation.

Maintenance & Community

  • Active development with recent updates including Llama 4 support and diffusion model experiments.
  • Presentations at PyTorch Conference 2024 and an upcoming ICLR 2025 poster.
  • Community discussion via the PyTorch forum.

Licensing & Compatibility

  • Licensed under BSD 3-Clause.
  • Users must adhere to separate licenses for third-party data and models.

Limitations & Caveats

The project is in a pre-release state, indicating potential for breaking changes and ongoing development. While showcasing Llama 3.1 training up to 512 GPUs, broader model support is experimental.

Health Check
Last commit

18 hours ago

Responsiveness

1 day

Pull Requests (30d)
129
Issues (30d)
36
Star History
525 stars in the last 90 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), Nathan Lambert Nathan Lambert(AI Researcher at AI2), and
1 more.

unified-io-2 by allenai

0.3%
619
Unified-IO 2 code for training, inference, and demo
created 1 year ago
updated 1 year ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Zhiqiang Xie Zhiqiang Xie(Author of SGLang).

veScale by volcengine

0.1%
839
PyTorch-native framework for LLM training
created 1 year ago
updated 3 weeks ago
Starred by Lewis Tunstall Lewis Tunstall(Researcher at Hugging Face), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
5 more.

torchtune by pytorch

0.2%
5k
PyTorch library for LLM post-training and experimentation
created 1 year ago
updated 23 hours ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Feedback? Help us improve.