LayoutDiffusion  by ZGCTroy

Diffusion model for layout-to-image generation

created 3 years ago
309 stars

Top 88.0% on sourcepulse

GitHubView on GitHub
Project Summary

LayoutDiffusion is a controllable diffusion model for generating images from layout specifications, targeting researchers and developers in computer vision and generative AI. It enables precise control over image composition by conditioning generation on spatial layouts, offering a novel approach to scene synthesis.

How It Works

LayoutDiffusion builds upon the guided-diffusion framework by incorporating a layout encoder (Layout Fusion Module - LFM) and object-aware cross-attention (OaCA). This allows the model to understand and integrate spatial layout information into the diffusion process, leading to more accurate and controllable image generation compared to unconditional diffusion models.

Quick Start & Requirements

  • Install: Use conda for environment setup, then pip install -e ./repositories/dpm_solver.
  • Prerequisites: Python 3.8, PyTorch 1.10.1, CUDA 11.3, omegaconf, opencv-python, gradio.
  • Demo: Run python scripts/launch_gradio_app.py with a specified config file.
  • Pretrained Models: Available for COCO-Stuff and VG datasets at various resolutions.
  • Docs: CVPR 2023 Paper

Highlighted Details

  • Achieves FID scores as low as 15.61 on COCO-Stuff 256x256.
  • Supports training on both latent and image spaces.
  • Includes a Gradio WebUI demo for easy interaction.
  • Provides evaluation scripts for FID, IS, DS, YOLO Score, and CAS.

Maintenance & Community

The project was accepted to CVPR 2023. The README indicates ongoing work on releasing pretrained latent space models. No specific community channels (Discord/Slack) are listed.

Licensing & Compatibility

The repository is based on openai/guided-diffusion, which is typically MIT licensed. However, specific licensing for LayoutDiffusion itself is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification of the license.

Limitations & Caveats

The README mentions the COCO-Stuff dataset is deprecated. The evaluation metrics (FID, IS, LPIPS, CAS) are noted as potentially confusing due to historical issues with related works, and the authors recommend newer benchmarks like LDM and Frido for beginners.

Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
4 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Feedback? Help us improve.