Min-SNR-Diffusion-Training by TiankaiHang

Accelerate diffusion model training

Created 2 years ago

263 stars

Top 97.0% on SourcePulse

Project Summary

This repository provides an efficient diffusion model training strategy, Min-SNR weighting, designed to accelerate convergence and improve sample quality for image generation tasks. It is targeted at researchers and practitioners working with diffusion models who aim to reduce training time and achieve state-of-the-art results.

How It Works

The Min-SNR weighting strategy addresses slow diffusion model convergence by treating training as a multi-task learning problem. It adaptively adjusts loss weights for different timesteps based on clamped signal-to-noise ratios (SNRs). This approach effectively balances conflicting optimization objectives across timesteps, leading to significantly faster convergence compared to previous methods.

Quick Start & Requirements

Training: bash configs/in256/vit-b_layer12_lr1e-4_099_099_pred_x0__min_snr_5__fp16_bs8x32.sh <GPUS> <BATCH_SIZE_PER_GPU>
Sampling (ImageNet-256): bash configs/in256/inference.sh or bash configs/in256/inference_limited_interval_guidance.sh
Sampling (ImageNet-64): bash configs/in64/inference.sh
Prerequisites: Python, CUDA (implied for fp16), ImageNet or CelebA datasets. ImageNet-256 requires pre-processing with AutoencoderKL from HuggingFace Diffusers.
Hardware: Requires multiple GPUs for efficient training (e.g., 8 GPUs shown in examples).

Highlighted Details

Achieves 3.4x faster convergence than previous weighting strategies.
Achieved a record FID score of 2.06 on ImageNet 256x256 with smaller architectures.
Further improved FID to 1.57 on ImageNet 256x256 using Limited Interval Guidance.
Integrated into HuggingFace diffusers and k-diffusion.

Maintenance & Community

The project is based on openai/guided-diffusion and uses sampling/FID evaluation from NVlabs/edm. It has seen adoption in projects like PLAID and MuLan.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, its reliance on openai/guided-diffusion and NVlabs/edm suggests potential licensing considerations for commercial use.

Limitations & Caveats

The README does not specify the exact license, which could impact commercial adoption. The training scripts are configured for specific model architectures (e.g., ViT-B) and require dataset preparation.

Min-SNR-Diffusion-Training by TiankaiHang

Explore Similar Projects

FreeDoM by yujiwen

segmoe by segmind

cycle-diffusion by ChenWu98

MDT by sail-sg

transfusion-pytorch by lucidrains

BLIP3o by JiuhaiChen

pytorch-stable-diffusion by hkproj

Palette-Image-to-Image-Diffusion-Models by Janspiry

img2img-turbo by GaParmar

improved-diffusion by openai

stargan-v2 by clovaai

guided-diffusion by openai