Pipeline parallel training script for diffusion models
Top 31.0% on sourcepulse
This project provides a pipeline-parallel training script for diffusion models, targeting researchers and practitioners needing to train large models that exceed single-GPU memory. It offers efficient multi-GPU training with features like checkpointing, pre-caching, and unified support for image and video models, simplifying the process of training advanced generative AI.
How It Works
The script leverages DeepSpeed's pipeline parallelism to partition model layers across multiple GPUs, enabling training of models too large for a single device. It incorporates hybrid data and pipeline parallelism, allowing flexible configuration of model distribution. Key optimizations include pre-caching latents and text embeddings to disk, freeing up VRAM by offloading VAE and text encoders during training.
Quick Start & Requirements
git clone --recurse-submodules
), create Conda environment (conda create -n diffusion-pipe python=3.12
), activate (conda activate diffusion-pipe
), install dependencies (pip install -r requirements.txt
).Highlighted Details
Maintenance & Community
This is noted as a side project with limited developer time. Recent updates show community contributions (PRs) for new models and features.
Licensing & Compatibility
The repository does not explicitly state a license in the README.
Limitations & Caveats
Native Windows support is difficult/impossible due to DeepSpeed's limited Windows compatibility; WSL 2 is recommended. Pre-caching latents means text encoder LoRA training is not currently supported. Resuming training requires using the original command-line config file.
1 day ago
1 day