Comfy-WaveSpeed  by chengzeyi

Inference optimization solution for ComfyUI

created 8 months ago
1,104 stars

Top 35.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides inference optimization for ComfyUI, targeting users seeking faster image and video generation. It offers universal, flexible, and fast solutions through dynamic caching and enhanced torch.compile integration, aiming to significantly reduce computation costs and generation times.

How It Works

The core of the optimization lies in two main techniques. "First Block Cache" (FBCache) leverages the residual output of the initial transformer block. If subsequent residual outputs are similar to the previous ones, it reuses cached results, skipping later computations for up to 2x speedup. "Enhanced torch.compile" compiles model components for faster execution, notably supporting LoRA models, unlike the original TorchCompileModel node.

Quick Start & Requirements

  • Install via git clone into ComfyUI's custom_nodes directory.
  • Requires ComfyUI.
  • torch.compile node has specific software/hardware requirements; refer to the Enhanced torch.compile section. FP8 quantization with torch.compile is not supported on pre-Ada GPUs (e.g., RTX 3090). torch.compile is not officially supported on Windows.
  • Demo workflows are available in the workflows folder.

Highlighted Details

  • First Block Cache (FBCache) offers 1.5x to 3.0x speedup with acceptable accuracy loss.
  • Supports various models including FLUX, LTXV, HunyuanVideo, SD3.5, and SDXL.
  • Enhanced torch.compile node works with LoRA models.
  • FBCache is incompatible with the FreeU Advanced node pack for SDXL.

Maintenance & Community

  • Project is marked as [WIP] (Work In Progress).
  • Users are encouraged to join the Discord server for requests and questions.

Licensing & Compatibility

  • License is not explicitly stated in the README.

Limitations & Caveats

  • Multi-GPU inference is listed as a future feature ([WIP]).
  • torch.compile may have issues with model offloading and requires specific configurations for optimal performance and to avoid recompilation.
  • FP8 quantization with torch.compile is not supported on older GPUs.
  • torch.compile is not officially supported on Windows.
Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
121 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 19 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
1 more.

SageAttention by thu-ml

2.4%
2k
Attention kernel for plug-and-play inference acceleration
created 10 months ago
updated 1 week ago
Feedback? Help us improve.