Recipes for optimizing diffusion models with torchao and diffusers
Top 77.5% on sourcepulse
This repository provides end-to-end recipes for optimizing diffusion models using torchao
and Hugging Face diffusers
, enabling faster inference and experimental FP8 training. It targets researchers and engineers working with large diffusion models who need to reduce computational costs and latency. The primary benefit is significant speedups and memory savings through quantization and compilation.
How It Works
The project leverages torchao
for quantization (e.g., INT8, FP8, FP6, FP4) and torch.compile()
for graph optimization. It demonstrates how to apply these techniques to popular diffusion models like Flux and CogVideoX. The approach involves integrating torchao
's quantization capabilities directly into diffusers
pipelines, allowing for fine-grained control over quantization schemes and compilation modes to achieve optimal performance and memory footprints.
Quick Start & Requirements
pip install -e .
(from the cloned repository)diffusers
nightly, torchao
nightly, CUDA 12.2+. Experiments were conducted on NVIDIA A100 and H100 GPUs.Highlighted Details
torch.compile()
for further performance gains, including strategies to avoid graph breaks.Maintenance & Community
torchao
is being integrated as an official quantization backend in diffusers
.Licensing & Compatibility
diffusers
is typically Apache 2.0 licensed, and torchao
is typically BSD 3-Clause licensed. Compatibility for commercial use is likely, but specific terms should be verified.Limitations & Caveats
2 months ago
1 day