RelayDiffusion  by zai-org

Research paper implementing Relay Diffusion for image synthesis

created 1 year ago
308 stars

Top 88.2% on sourcepulse

GitHubView on GitHub
Project Summary

Relay Diffusion Model (RDM) offers a novel framework for image synthesis by unifying diffusion processes across resolutions. It enables seamless transitions between different resolutions without restarting from noise, targeting researchers and practitioners in generative AI. RDM achieves state-of-the-art FID scores on CelebA-HQ and sFID on ImageNet-256.

How It Works

RDM employs a two-stage diffusion process. The first stage is a standard diffusion model, while the second stage utilizes a "blurring diffusion" process. This allows RDM to transfer a low-resolution image or noise into a high-resolution equivalent by progressively de-blurring and adding noise in blocks. This approach avoids the need for retraining or complex conditioning when changing resolutions, offering flexibility and efficiency.

Quick Start & Requirements

  • Install via conda env create -f environment.yml and conda activate rdm.
  • Recommended: Linux servers with Nvidia A100s. Inference and training are possible on less powerful GPUs by adjusting --batch-gpu.
  • Requires PyTorch.
  • Dataset preparation instructions are provided for CelebA-HQ and ImageNet, following EDM format.
  • Official checkpoints and sampler settings are available for CelebA-HQ and ImageNet.
  • Links: WiseModel, Model Scope

Highlighted Details

  • Achieves FID=1.87 and sFID=3.97 on ImageNet-256.
  • Enables seamless resolution transfer without re-training.
  • Leverages xformers for memory-efficient attention, reducing training cost by ~15%.
  • Supports multiple sampling stages and configurations for fine-grained control.

Maintenance & Community

The implementation is based on the NVlabs/edm codebase. No specific community channels or active contributor information is detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The codebase is based on NVlabs/edm, which is typically released under a permissive license, but this should be verified. Compatibility for commercial use is not specified.

Limitations & Caveats

The README recommends high-end GPUs (Nvidia A100s) for optimal performance, suggesting potential resource constraints for users with less powerful hardware. Activation data for ImageNet Precision and Recall calculations can be very large (up to 40GB).

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.