Discover and explore top open-source AI tools and projects—updated daily.
sayakpaulRecipes for optimizing diffusion models with torchao and diffusers
Top 73.1% on SourcePulse
This repository provides end-to-end recipes for optimizing diffusion models using torchao and Hugging Face diffusers, enabling faster inference and experimental FP8 training. It targets researchers and engineers working with large diffusion models who need to reduce computational costs and latency. The primary benefit is significant speedups and memory savings through quantization and compilation.
How It Works
The project leverages torchao for quantization (e.g., INT8, FP8, FP6, FP4) and torch.compile() for graph optimization. It demonstrates how to apply these techniques to popular diffusion models like Flux and CogVideoX. The approach involves integrating torchao's quantization capabilities directly into diffusers pipelines, allowing for fine-grained control over quantization schemes and compilation modes to achieve optimal performance and memory footprints.
Quick Start & Requirements
pip install -e . (from the cloned repository)diffusers nightly, torchao nightly, CUDA 12.2+. Experiments were conducted on NVIDIA A100 and H100 GPUs.Highlighted Details
torch.compile() for further performance gains, including strategies to avoid graph breaks.Maintenance & Community
torchao is being integrated as an official quantization backend in diffusers.Licensing & Compatibility
diffusers is typically Apache 2.0 licensed, and torchao is typically BSD 3-Clause licensed. Compatibility for commercial use is likely, but specific terms should be verified.Limitations & Caveats
1 month ago
1 day
dropbox
NVIDIA
huggingface