Min-SNR-Diffusion-Training  by TiankaiHang

Accelerate diffusion model training

Created 2 years ago
260 stars

Top 97.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides an efficient diffusion model training strategy, Min-SNR weighting, designed to accelerate convergence and improve sample quality for image generation tasks. It is targeted at researchers and practitioners working with diffusion models who aim to reduce training time and achieve state-of-the-art results.

How It Works

The Min-SNR weighting strategy addresses slow diffusion model convergence by treating training as a multi-task learning problem. It adaptively adjusts loss weights for different timesteps based on clamped signal-to-noise ratios (SNRs). This approach effectively balances conflicting optimization objectives across timesteps, leading to significantly faster convergence compared to previous methods.

Quick Start & Requirements

  • Training: bash configs/in256/vit-b_layer12_lr1e-4_099_099_pred_x0__min_snr_5__fp16_bs8x32.sh <GPUS> <BATCH_SIZE_PER_GPU>
  • Sampling (ImageNet-256): bash configs/in256/inference.sh or bash configs/in256/inference_limited_interval_guidance.sh
  • Sampling (ImageNet-64): bash configs/in64/inference.sh
  • Prerequisites: Python, CUDA (implied for fp16), ImageNet or CelebA datasets. ImageNet-256 requires pre-processing with AutoencoderKL from HuggingFace Diffusers.
  • Hardware: Requires multiple GPUs for efficient training (e.g., 8 GPUs shown in examples).

Highlighted Details

  • Achieves 3.4x faster convergence than previous weighting strategies.
  • Achieved a record FID score of 2.06 on ImageNet 256x256 with smaller architectures.
  • Further improved FID to 1.57 on ImageNet 256x256 using Limited Interval Guidance.
  • Integrated into HuggingFace diffusers and k-diffusion.

Maintenance & Community

The project is based on openai/guided-diffusion and uses sampling/FID evaluation from NVlabs/edm. It has seen adoption in projects like PLAID and MuLan.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, its reliance on openai/guided-diffusion and NVlabs/edm suggests potential licensing considerations for commercial use.

Limitations & Caveats

The README does not specify the exact license, which could impact commercial adoption. The training scripts are configured for specific model architectures (e.g., ViT-B) and require dataset preparation.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

cycle-diffusion by ChenWu98

0%
640
PyTorch code for diffusion model latent space research paper
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.