Inference engine for parallel Diffusion Transformer (DiT) deployment
Top 21.3% on sourcepulse
xDiT is a scalable inference engine designed to accelerate Diffusion Transformer (DiT) models for image and video generation. It addresses the quadratic complexity of attention mechanisms in DiTs, enabling efficient deployment across multiple GPUs and machines for real-time applications. The engine targets researchers and developers working with large-scale DiT models, offering significant performance gains through advanced parallelism and single-GPU acceleration techniques.
How It Works
xDiT employs a hybrid parallelism strategy, combining techniques like Unified Sequence Parallelism (USP), PipeFusion (sequence-level pipeline parallelism), CFG Parallel, and Data Parallel. USP is a novel approach that unifies DeepSpeed-Ulysses and Ring-Attention for efficient sequence parallelism. PipeFusion leverages temporal redundancy in diffusion models for pipeline parallelism. These methods can be hybridized, with the product of parallel degrees matching the total number of devices. Additionally, xDiT incorporates single-GPU acceleration through kernel optimizations, compilation acceleration (torch.compile, onediff), and cache acceleration (TeaCache, First-Block-Cache, DiTFastAttn) to exploit computational redundancies.
Quick Start & Requirements
pip install xfuser
or pip install "xfuser[diffusers,flash-attn]"
for optional dependencies. Install from source with pip install -e .
or pip install -e ".[diffusers,flash-attn]"
. Docker image available: thufeifeibear/xdit-dev
.flash-attn
(>= 2.6.0 recommended for optimal GPU performance, fallback available for NPU compatibility). diffusers
is optional but recommended for many models../examples/
. Run with bash examples/run.sh
. Hybrid parallelism requires careful configuration of degrees (e.g., ulysses_degree * pipefusion_parallel_degree * cfg_degree == num_devices
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
diffusers
versions may be required for certain models, necessitating potential version management.1 week ago
1 day