cache-dit  by vipshop

Accelerate diffusion transformer inference with unified caching

Created 3 months ago
316 stars

Top 85.4% on SourcePulse

GitHubView on GitHub
Project Summary

A Unified Cache Acceleration Toolbox for 🤗Diffusers: FLUX.1, Qwen-Image-Edit, Qwen-Image, Qwen-Image-Lightning, Wan 2.1/2.2, etc.

A Unified Cache Acceleration Toolbox for 🤗Diffusers: FLUX.1, Qwen-Image-Edit, Qwen-Image, Qwen-Image-Lightning, Wan 2.1/2.2, etc.

cache-dit is a Python toolbox accelerating Diffusion Transformer (DiT) models within 🤗Diffusers. It offers training-free cache acceleration via techniques like DBCache and TaylorSeer, significantly speeding up inference. Targeting researchers and engineers, it provides a unified API for easy integration across numerous DiT architectures.

How It Works

The library reduces redundant computations using caching mechanisms. Its unified API (cache_dit.enable_cache) simplifies integration. Key techniques include DBCache, which balances performance and precision through configurable blocks (Fn, Bn), and Hybrid TaylorSeer for improved accuracy with larger cache steps using Taylor series expansion. It also supports CFG caching and torch.compile compatibility.

Quick Start & Requirements

Install via pip: pip install -U cache-dit. Requires Python, 🤗Diffusers, and PyTorch. GPU acceleration is recommended. Repository examples and documentation detail integration for specific models.

Highlighted Details

  • Supports numerous DiT models (Qwen-Image, FLUX.1, Wan 2.1/2.2, SD 3/3.5, etc.).
  • Achieves significant speedups (up to 3.3x reported) with configurations like FP8 quantization and torch.compile.
  • Features an Automatic Block Adapter for custom Transformer blocks.
  • Includes a CLI for evaluating accuracy metrics (PSNR, FID).

Maintenance & Community

Primarily associated with "vipshop.com". Community contribution is encouraged via GitHub stars and CONTRIBUTE.md. No specific community channels or roadmap details are provided.

Licensing & Compatibility

The license type is not specified in the provided README. Compatible with 🤗Diffusers and torch.compile.

Limitations & Caveats

Unified cache APIs are experimental. torch.compile with dynamic shapes may require torch._dynamo recompile limit adjustments. Project authorship appears concentrated, potentially indicating a low bus factor.

Health Check
Last Commit

20 hours ago

Responsiveness

Inactive

Pull Requests (30d)
99
Issues (30d)
17
Star History
148 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jeremy Howard Jeremy Howard(Cofounder of fast.ai).

GPTFast by MDK8888

0%
687
HF Transformers accelerator for faster inference
Created 1 year ago
Updated 1 year ago
Starred by Chaoyu Yang Chaoyu Yang(Founder of Bento), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

nunchaku by nunchaku-tech

1.9%
3k
High-performance 4-bit diffusion model inference engine
Created 10 months ago
Updated 2 days ago
Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Cody Yu Cody Yu(Coauthor of vLLM; MTS at OpenAI).

xDiT by xdit-project

0.7%
2k
Inference engine for parallel Diffusion Transformer (DiT) deployment
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.