semantic-diffusion-model  by WeilunWang

Semantic image synthesis with diffusion models

Created 3 years ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation for Semantic Image Synthesis via Diffusion Models (SDM). It targets researchers and practitioners in generative AI and computer vision, offering a novel framework for high-fidelity and semantically consistent image generation from layout masks.

How It Works

SDM employs a novel DDPM-based framework that processes semantic layout and noisy images distinctly. Unlike prior methods that feed both directly into a U-Net, SDM routes the noisy image to the U-Net encoder and the semantic layout to the decoder via multi-layer spatially-adaptive normalization operators. This approach aims to better leverage semantic information for improved generation quality and interpretability. The implementation also incorporates classifier-free guidance sampling for enhanced results.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies.
  • Prerequisites: Linux, Python, CPU or NVIDIA GPU with CUDA and CuDNN. Datasets (Cityscapes, ADE20K, CelebAMask-HQ, COCO-Stuff) require separate download and preparation following provided instructions or links.
  • Training: Example commands provided for training and fine-tuning SDM models on datasets like ADE20K, utilizing mpiexec for distributed training.
  • Testing: Commands for generating samples and evaluating FID/LPIPS metrics are included.
  • Links: Pretrained model checkpoints and visual results are available for Cityscapes, ADE20K, CelebAMask-HQ, and COCO-Stuff.

Highlighted Details

  • Achieves state-of-the-art performance on benchmark datasets in terms of FID and LPIPS.
  • Novel decoder-side integration of semantic masks via spatially-adaptive normalization.
  • Utilizes classifier-free guidance for improved sampling.
  • Supports training and fine-tuning on multiple datasets.

Maintenance & Community

The project is based on guided-diffusion and acknowledges contributions from OASIS and stargan-v2 for evaluation scripts. No specific community channels or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The project is based on guided-diffusion, which is typically MIT licensed, but this specific repository's license requires verification. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README indicates that pretrained models are "to be updated," suggesting potential incompleteness. Dataset preparation requires manual steps and adherence to external instructions. The use of mpiexec implies a distributed computing environment is recommended for efficient training.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.