xDiT  by xdit-project

Inference engine for parallel Diffusion Transformer (DiT) deployment

Created 1 year ago
2,279 stars

Top 19.9% on SourcePulse

GitHubView on GitHub
Project Summary

xDiT is a scalable inference engine designed to accelerate Diffusion Transformer (DiT) models for image and video generation. It addresses the quadratic complexity of attention mechanisms in DiTs, enabling efficient deployment across multiple GPUs and machines for real-time applications. The engine targets researchers and developers working with large-scale DiT models, offering significant performance gains through advanced parallelism and single-GPU acceleration techniques.

How It Works

xDiT employs a hybrid parallelism strategy, combining techniques like Unified Sequence Parallelism (USP), PipeFusion (sequence-level pipeline parallelism), CFG Parallel, and Data Parallel. USP is a novel approach that unifies DeepSpeed-Ulysses and Ring-Attention for efficient sequence parallelism. PipeFusion leverages temporal redundancy in diffusion models for pipeline parallelism. These methods can be hybridized, with the product of parallel degrees matching the total number of devices. Additionally, xDiT incorporates single-GPU acceleration through kernel optimizations, compilation acceleration (torch.compile, onediff), and cache acceleration (TeaCache, First-Block-Cache, DiTFastAttn) to exploit computational redundancies.

Quick Start & Requirements

  • Installation: pip install xfuser or pip install "xfuser[diffusers,flash-attn]" for optional dependencies. Install from source with pip install -e . or pip install -e ".[diffusers,flash-attn]". Docker image available: thufeifeibear/xdit-dev.
  • Prerequisites: flash-attn (>= 2.6.0 recommended for optimal GPU performance, fallback available for NPU compatibility). diffusers is optional but recommended for many models.
  • Usage: Examples provided in ./examples/. Run with bash examples/run.sh. Hybrid parallelism requires careful configuration of degrees (e.g., ulysses_degree * pipefusion_parallel_degree * cfg_degree == num_devices).
  • Links: Papers, Quick Start, Supported DiTs, Dev Guide, Discussion.

Highlighted Details

  • Supports a wide range of DiT models including StepVideo, HunyuanVideo, PixArt-Sigma, and Stable Diffusion 3.
  • Pioneers USP and PipeFusion for efficient sequence and pipeline parallelism, respectively.
  • Offers hybrid parallelism to combine multiple strategies for optimal scaling.
  • Includes single-GPU acceleration via compilation (torch.compile, onediff) and cache methods.

Maintenance & Community

  • Active development with a recent major API upgrade in August 2024.
  • Community Discord server available: https://discord.gg/YEWzWfCF9S.
  • Actively seeking contributions for new features and models.

Licensing & Compatibility

  • The primary license is not explicitly stated in the README. However, the project cites multiple research papers, suggesting a research-oriented focus. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

  • Legacy APIs are outdated and do not support hybrid parallelism; users are strongly encouraged to use the new APIs.
  • Cache acceleration methods are currently only supported for the FLUX model with USP and not for PipeFusion.
  • Specific diffusers versions may be required for certain models, necessitating potential version management.
Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
8
Issues (30d)
9
Star History
76 stars in the last 30 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
Created 3 years ago
Updated 3 years ago
Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
11 more.

ctransformers by marella

0.1%
2k
Python bindings for fast Transformer model inference
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.