q-diffusion by Xiuyu-Li

Quantization method for diffusion models

Created 2 years ago

369 stars

Top 76.5% on SourcePulse

Project Summary

Q-Diffusion offers a novel post-training quantization (PTQ) method specifically designed for diffusion models, enabling significant compression (e.g., 4-bit weights) with minimal performance degradation. This is particularly beneficial for researchers and engineers aiming to accelerate inference and reduce the memory footprint of diffusion models for applications like text-to-image generation.

How It Works

Q-Diffusion addresses the unique challenges of quantizing diffusion models, such as varying output distributions across timesteps and bimodal activation distributions in shortcut layers. It employs timestep-aware calibration and split shortcut quantization to maintain accuracy. This approach allows for efficient compression of the noise estimation network without requiring retraining, a significant advantage over traditional PTQ methods that struggle with diffusion model architectures.

Quick Start & Requirements

Installation: Clone the repository and create a conda environment using conda env create -f environment.yml.
Prerequisites: Requires PyTorch, CUDA (implied for GPU acceleration), and specific checkpoints from CompVis (e.g., sd-v1-4.ckpt). Quantized checkpoints are available via Google Drive.
Usage: Inference scripts are provided for CIFAR-10, LSUN Bedroom, LSUN Churches, and Stable Diffusion, with example commands for 4/8-bit weight-only and mixed-precision quantization. Calibration scripts are also available.
Links: website, paper, NVIDIA TensorRT example

Highlighted Details

Achieves 4-bit quantization with FID changes of at most 2.34, compared to >100 for traditional PTQ.
Enables 4-bit Stable Diffusion inference with high generation quality.
Features timestep-aware calibration and split shortcut quantization.
Compatible with NVIDIA TensorRT.

Maintenance & Community

The project is associated with ICCV 2023. Further community engagement channels are not explicitly mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, its reliance on models from CompVis (which typically use permissive licenses) and its integration with NVIDIA TensorRT suggest potential compatibility with commercial use, but this should be verified.

Limitations & Caveats

The README mentions that calibration datasets are large, but smaller subsets will be uploaded soon. Reproducing calibrated checkpoints requires specific hyperparameters, and deviations may affect performance.

q-diffusion by Xiuyu-Li

Explore Similar Projects

Awesome-LLM-Quantization by pprp

EfficientQAT by OpenGVLab

Atom by efeslab

hart by mit-han-lab

CCSR by csslc

quip-sharp by Cornell-RelaxML

deepcompressor by nunchaku-tech

v-diffusion-pytorch by crowsonkb

dpm-solver by LuChengTHU

ComfyUI-nunchaku by nunchaku-tech

Awesome-Model-Quantization by Efficient-ML

ao by pytorch