diffusion-pipe  by tdrussell

Pipeline parallel training script for diffusion models

Created 1 year ago
1,556 stars

Top 26.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a pipeline-parallel training script for diffusion models, targeting researchers and practitioners needing to train large models that exceed single-GPU memory. It offers efficient multi-GPU training with features like checkpointing, pre-caching, and unified support for image and video models, simplifying the process of training advanced generative AI.

How It Works

The script leverages DeepSpeed's pipeline parallelism to partition model layers across multiple GPUs, enabling training of models too large for a single device. It incorporates hybrid data and pipeline parallelism, allowing flexible configuration of model distribution. Key optimizations include pre-caching latents and text embeddings to disk, freeing up VRAM by offloading VAE and text encoders during training.

Quick Start & Requirements

  • Install: Clone repo with submodules (git clone --recurse-submodules), create Conda environment (conda create -n diffusion-pipe python=3.12), activate (conda activate diffusion-pipe), install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.12, CUDA (matching PyTorch), GCC 12 (for TransformerEngine), CUDNN. TransformerEngine is required for Cosmos.
  • Setup: Requires careful environment setup, especially for TransformerEngine.
  • Docs: Supported Models

Highlighted Details

  • Supports a wide range of models including SDXL, Flux, LTX-Video, HunyuanVideo, Cosmos, Lumina, Wan, and Chroma.
  • Features block swapping and NF4 quantization for significantly reduced VRAM usage, enabling LoRA training on single 4090 GPUs.
  • Offers unified support for both image and video models, with flexible configuration via TOML files.
  • Includes efficient multi-process, multi-GPU pre-caching of latents and text embeddings.

Maintenance & Community

This is noted as a side project with limited developer time. Recent updates show community contributions (PRs) for new models and features.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

Native Windows support is difficult/impossible due to DeepSpeed's limited Windows compatibility; WSL 2 is recommended. Pre-caching latents means text encoder LoRA training is not currently supported. Resuming training requires using the original command-line config file.

Health Check
Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
31
Star History
111 stars in the last 30 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Coauthor of SGLang) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

0.4%
455
CLI tool for LLM latency/memory analysis during training/inference
Created 2 years ago
Updated 5 months ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

3.4%
1k
Framework for scaling multimodal model training across accelerators
Created 5 months ago
Updated 3 weeks ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
1 more.

diffusion by mosaicml

0%
707
Diffusion model training code
Created 2 years ago
Updated 8 months ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 21 hours ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 15 hours ago
Feedback? Help us improve.