TaylorSeer by Shenyi-Z

Accelerating diffusion models with predictive feature caching

Created 10 months ago

355 stars

Top 78.8% on SourcePulse

Project Summary

TaylorSeer accelerates Diffusion Transformer (DiT) models for image and video synthesis by predicting future timestep features using Taylor series expansion, enabling significant speedups without retraining. It targets researchers and developers working with high-fidelity generative models who need to reduce inference latency for real-time applications.

How It Works

TaylorSeer leverages the observation that diffusion model features evolve slowly and continuously across timesteps. It approximates higher-order derivatives of these features to predict future states via Taylor series expansion. This forecasting approach aims to overcome the quality degradation seen in traditional feature caching methods when timestep intervals are large, offering substantial acceleration with minimal impact on generation quality.

Quick Start & Requirements

Installation: Clone the repository (git clone https://github.com/Shenyi-Z/TaylorSeer.git). Specific implementations for FLUX, HunyuanVideo, DiT, Wan2.1, and HiDream are available in subdirectories.
Prerequisites: PyTorch, Diffusers, and potentially CUDA for GPU acceleration. Specific model dependencies are noted in individual subdirectories.
Resources: Requires significant GPU memory and compute for diffusion models.
Links:
- FLUX: TaylorSeer-FLUX
- Diffusers: TaylorSeers-Diffusers
- xDiT: TaylorSeers-xDiT
- HunyuanVideo: TaylorSeer-HunyuanVideo
- DiT: TaylorSeer-DiT
- Wan2.1: TaylorSeer-Wan2.1
- HiDream: TaylorSeer-HiDream

Highlighted Details

Achieves 4.99x lossless compression and 3.53x latency speedup on FLUX.1-dev.
Achieves 5.00x compression and 4.65x latency speedup on HunyuanVideo.
On DiT, achieves 3.41 lower FID than SOTA at 4.53x acceleration.
Supports multi-GPU parallel inference for models like HunyuanVideo and Wan2.1.

Maintenance & Community

The project is associated with ICCV 2025 and ICLR 2025 submissions. It acknowledges contributions from various model implementations (DiT, FLUX, HiDream, etc.) and has community contributions like ComfyUI-TaylorSeer. Contact email: shenyizou@outlook.com.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as research code, with specific implementations for various models. While claiming "lossless" or "near lossless" acceleration, the exact quality metrics and potential trade-offs at higher acceleration ratios may require further investigation. The primary focus is on DiT architectures and related models.

TaylorSeer by Shenyi-Z

Explore Similar Projects

EasyCache by H-EmbodVis

MagCache by Zehong-Ma

DC-Gen by dc-ai-projects

ComfyUI-MagCache by Zehong-Ma

DLoRAL by yjsunnn

distrifuser by mit-han-lab

TeaCache by ali-vilab

vdvae by openai

EasyAnimate by aigc-apps

musubi-tuner by kohya-ss

FastVideo by hao-ai-lab

xDiT by xdit-project