diffusers-torchao  by sayakpaul

Recipes for optimizing diffusion models with torchao and diffusers

created 1 year ago
370 stars

Top 77.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides end-to-end recipes for optimizing diffusion models using torchao and Hugging Face diffusers, enabling faster inference and experimental FP8 training. It targets researchers and engineers working with large diffusion models who need to reduce computational costs and latency. The primary benefit is significant speedups and memory savings through quantization and compilation.

How It Works

The project leverages torchao for quantization (e.g., INT8, FP8, FP6, FP4) and torch.compile() for graph optimization. It demonstrates how to apply these techniques to popular diffusion models like Flux and CogVideoX. The approach involves integrating torchao's quantization capabilities directly into diffusers pipelines, allowing for fine-grained control over quantization schemes and compilation modes to achieve optimal performance and memory footprints.

Quick Start & Requirements

  • Install: pip install -e . (from the cloned repository)
  • Prerequisites: PyTorch nightly, diffusers nightly, torchao nightly, CUDA 12.2+. Experiments were conducted on NVIDIA A100 and H100 GPUs.
  • Setup: Requires cloning the repository and installing dependencies.
  • More Info: Diffusers Documentation, TorchAO Documentation

Highlighted Details

  • Achieves up to 53.88% speedup on Flux.1-Dev and 33.04% on CogVideoX-5b on H100 GPUs compared to standard bf16.
  • Demonstrates significant memory reduction, e.g., CogVideoX-5b requiring ~10.3 GB model memory with INT8 weight-only quantization vs. ~19.7 GB for bf16.
  • Explores various quantization dtypes (INT8, FP8, FP6, FP4) and their impact on speed, memory, and quality.
  • Integrates torch.compile() for further performance gains, including strategies to avoid graph breaks.

Maintenance & Community

  • Actively developed with contributions acknowledged from the PyTorch team.
  • torchao is being integrated as an official quantization backend in diffusers.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README. diffusers is typically Apache 2.0 licensed, and torchao is typically BSD 3-Clause licensed. Compatibility for commercial use is likely, but specific terms should be verified.

Limitations & Caveats

  • Experimental FP8 training is mentioned but not detailed.
  • Semi-structured sparsity with INT8 dynamic quantization can significantly degrade image quality.
  • Quantization support is best on Ampere and newer architectures; Turing/Volta and Apple MPS backends may have issues or offer no benefits.
  • Benchmarking scripts can be time-consuming due to compilation and warmup runs.
Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
27 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 17 hours ago
Feedback? Help us improve.