semantic-diffusion-model by WeilunWang

Semantic image synthesis with diffusion models

Created 3 years ago

259 stars

Top 97.9% on SourcePulse

Project Summary

This repository provides the official PyTorch implementation for Semantic Image Synthesis via Diffusion Models (SDM). It targets researchers and practitioners in generative AI and computer vision, offering a novel framework for high-fidelity and semantically consistent image generation from layout masks.

How It Works

SDM employs a novel DDPM-based framework that processes semantic layout and noisy images distinctly. Unlike prior methods that feed both directly into a U-Net, SDM routes the noisy image to the U-Net encoder and the semantic layout to the decoder via multi-layer spatially-adaptive normalization operators. This approach aims to better leverage semantic information for improved generation quality and interpretability. The implementation also incorporates classifier-free guidance sampling for enhanced results.

Quick Start & Requirements

Install: Clone the repository and install dependencies.
Prerequisites: Linux, Python, CPU or NVIDIA GPU with CUDA and CuDNN. Datasets (Cityscapes, ADE20K, CelebAMask-HQ, COCO-Stuff) require separate download and preparation following provided instructions or links.
Training: Example commands provided for training and fine-tuning SDM models on datasets like ADE20K, utilizing mpiexec for distributed training.
Testing: Commands for generating samples and evaluating FID/LPIPS metrics are included.
Links: Pretrained model checkpoints and visual results are available for Cityscapes, ADE20K, CelebAMask-HQ, and COCO-Stuff.

Highlighted Details

Achieves state-of-the-art performance on benchmark datasets in terms of FID and LPIPS.
Novel decoder-side integration of semantic masks via spatially-adaptive normalization.
Utilizes classifier-free guidance for improved sampling.
Supports training and fine-tuning on multiple datasets.

Maintenance & Community

The project is based on guided-diffusion and acknowledges contributions from OASIS and stargan-v2 for evaluation scripts. No specific community channels or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project is based on guided-diffusion, which is typically MIT licensed, but this specific repository's license requires verification. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README indicates that pretrained models are "to be updated," suggesting potential incompleteness. Dataset preparation requires manual steps and adherence to external instructions. The use of mpiexec implies a distributed computing environment is recommended for efficient training.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days