RelayDiffusion by zai-org

Research paper implementing Relay Diffusion for image synthesis

Created 2 years ago

313 stars

Top 86.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jiaming Song

Chief Scientist at Luma AI

Project Summary

Relay Diffusion Model (RDM) offers a novel framework for image synthesis by unifying diffusion processes across resolutions. It enables seamless transitions between different resolutions without restarting from noise, targeting researchers and practitioners in generative AI. RDM achieves state-of-the-art FID scores on CelebA-HQ and sFID on ImageNet-256.

How It Works

RDM employs a two-stage diffusion process. The first stage is a standard diffusion model, while the second stage utilizes a "blurring diffusion" process. This allows RDM to transfer a low-resolution image or noise into a high-resolution equivalent by progressively de-blurring and adding noise in blocks. This approach avoids the need for retraining or complex conditioning when changing resolutions, offering flexibility and efficiency.

Quick Start & Requirements

Install via conda env create -f environment.yml and conda activate rdm.
Recommended: Linux servers with Nvidia A100s. Inference and training are possible on less powerful GPUs by adjusting --batch-gpu.
Requires PyTorch.
Dataset preparation instructions are provided for CelebA-HQ and ImageNet, following EDM format.
Official checkpoints and sampler settings are available for CelebA-HQ and ImageNet.
Links: WiseModel, Model Scope

Highlighted Details

Achieves FID=1.87 and sFID=3.97 on ImageNet-256.
Enables seamless resolution transfer without re-training.
Leverages xformers for memory-efficient attention, reducing training cost by ~15%.
Supports multiple sampling stages and configurations for fine-grained control.

Maintenance & Community

The implementation is based on the NVlabs/edm codebase. No specific community channels or active contributor information is detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The codebase is based on NVlabs/edm, which is typically released under a permissive license, but this should be verified. Compatibility for commercial use is not specified.

Limitations & Caveats

The README recommends high-end GPUs (Nvidia A100s) for optimal performance, suggesting potential resource constraints for users with less powerful hardware. Activation data for ImageNet Precision and Recall calculations can be very large (up to 40GB).

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days