q-diffusion  by Xiuyu-Li

Quantization method for diffusion models

Created 2 years ago
354 stars

Top 78.8% on SourcePulse

GitHubView on GitHub
Project Summary

Q-Diffusion offers a novel post-training quantization (PTQ) method specifically designed for diffusion models, enabling significant compression (e.g., 4-bit weights) with minimal performance degradation. This is particularly beneficial for researchers and engineers aiming to accelerate inference and reduce the memory footprint of diffusion models for applications like text-to-image generation.

How It Works

Q-Diffusion addresses the unique challenges of quantizing diffusion models, such as varying output distributions across timesteps and bimodal activation distributions in shortcut layers. It employs timestep-aware calibration and split shortcut quantization to maintain accuracy. This approach allows for efficient compression of the noise estimation network without requiring retraining, a significant advantage over traditional PTQ methods that struggle with diffusion model architectures.

Quick Start & Requirements

  • Installation: Clone the repository and create a conda environment using conda env create -f environment.yml.
  • Prerequisites: Requires PyTorch, CUDA (implied for GPU acceleration), and specific checkpoints from CompVis (e.g., sd-v1-4.ckpt). Quantized checkpoints are available via Google Drive.
  • Usage: Inference scripts are provided for CIFAR-10, LSUN Bedroom, LSUN Churches, and Stable Diffusion, with example commands for 4/8-bit weight-only and mixed-precision quantization. Calibration scripts are also available.
  • Links: website, paper, NVIDIA TensorRT example

Highlighted Details

  • Achieves 4-bit quantization with FID changes of at most 2.34, compared to >100 for traditional PTQ.
  • Enables 4-bit Stable Diffusion inference with high generation quality.
  • Features timestep-aware calibration and split shortcut quantization.
  • Compatible with NVIDIA TensorRT.

Maintenance & Community

The project is associated with ICCV 2023. Further community engagement channels are not explicitly mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. However, its reliance on models from CompVis (which typically use permissive licenses) and its integration with NVIDIA TensorRT suggest potential compatibility with commercial use, but this should be verified.

Limitations & Caveats

The README mentions that calibration datasets are large, but smaller subsets will be uploaded soon. Reproducing calibrated checkpoints requires specific hyperparameters, and deviations may affect performance.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.