RelayDiffusion  by zai-org

Research paper implementing Relay Diffusion for image synthesis

Created 2 years ago
307 stars

Top 87.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Relay Diffusion Model (RDM) offers a novel framework for image synthesis by unifying diffusion processes across resolutions. It enables seamless transitions between different resolutions without restarting from noise, targeting researchers and practitioners in generative AI. RDM achieves state-of-the-art FID scores on CelebA-HQ and sFID on ImageNet-256.

How It Works

RDM employs a two-stage diffusion process. The first stage is a standard diffusion model, while the second stage utilizes a "blurring diffusion" process. This allows RDM to transfer a low-resolution image or noise into a high-resolution equivalent by progressively de-blurring and adding noise in blocks. This approach avoids the need for retraining or complex conditioning when changing resolutions, offering flexibility and efficiency.

Quick Start & Requirements

  • Install via conda env create -f environment.yml and conda activate rdm.
  • Recommended: Linux servers with Nvidia A100s. Inference and training are possible on less powerful GPUs by adjusting --batch-gpu.
  • Requires PyTorch.
  • Dataset preparation instructions are provided for CelebA-HQ and ImageNet, following EDM format.
  • Official checkpoints and sampler settings are available for CelebA-HQ and ImageNet.
  • Links: WiseModel, Model Scope

Highlighted Details

  • Achieves FID=1.87 and sFID=3.97 on ImageNet-256.
  • Enables seamless resolution transfer without re-training.
  • Leverages xformers for memory-efficient attention, reducing training cost by ~15%.
  • Supports multiple sampling stages and configurations for fine-grained control.

Maintenance & Community

The implementation is based on the NVlabs/edm codebase. No specific community channels or active contributor information is detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The codebase is based on NVlabs/edm, which is typically released under a permissive license, but this should be verified. Compatibility for commercial use is not specified.

Limitations & Caveats

The README recommends high-end GPUs (Nvidia A100s) for optimal performance, suggesting potential resource constraints for users with less powerful hardware. Activation data for ImageNet Precision and Recall calculations can be very large (up to 40GB).

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Robin Huang Robin Huang(Cofounder of Comfy Org), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
17 more.

stablediffusion by Stability-AI

0.1%
42k
Latent diffusion model for high-resolution image synthesis
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.