Inference optimization framework for HuggingFace Diffusers
Top 31.7% on sourcepulse
This project provides an ultra-lightweight inference optimization framework for HuggingFace Diffusers on NVIDIA GPUs, targeting users who need to maximize inference speed and efficiency. It offers significantly faster compilation times compared to alternatives like TensorRT or torch.compile
, while supporting dynamic shapes, LoRA, and ControlNet out-of-the-box.
How It Works
Stable-fast employs several key techniques to achieve its performance gains. These include CUDNN convolution fusion, low-precision fused GEMM operations, fused GEGLU kernels, and optimized NHWC GroupNorm with OpenAI's Triton. It also leverages CUDA Graphs for reduced CPU overhead with small batch sizes and dynamic shapes, and integrates xformers for fused multihead attention compatibility with TorchScript. The framework aims for minimal overhead by acting as a plugin for PyTorch, enhancing existing functionalities.
Quick Start & Requirements
pip3 install --index-url https://download.pytorch.org/whl/cu121 'torch>=2.1.0' 'xformers>=0.0.22' 'triton>=2.1.0' 'diffusers>=0.19.3'
. Installation from source requires CUDNN/CUBLAS and optionally Ninja.Highlighted Details
torch.compile
on RTX 4080, H100, and A100 GPUs.Maintenance & Community
Active development on stable-fast
has been paused, with the author focusing on a new torch.dynamo
-based project for newer models and broader hardware support. A Discord channel is available for community support.
Licensing & Compatibility
The project appears to be under a permissive license, though specific details are not explicitly stated in the README. It is compatible with various Hugging Face Diffusers versions, ControlNet, LoRA, LCM, SDXL Turbo, and Stable Video Diffusion.
Limitations & Caveats
The project's active development has been paused in favor of a new project. Compatibility with PyTorch versions outside the tested range (>=2.1.0) is not guaranteed. Progress bar accuracy may be affected by CUDA's asynchronous nature.
4 months ago
1 week