latent-diffusion  by CompVis

Image synthesis research paper using latent diffusion models

created 3 years ago
13,142 stars

Top 3.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for Latent Diffusion Models (LDMs), a class of generative models capable of high-resolution image synthesis. It targets researchers and practitioners in computer vision and deep learning interested in state-of-the-art image generation, offering pre-trained models and code for various tasks including text-to-image, inpainting, and retrieval-augmented generation.

How It Works

LDMs operate by performing diffusion in a lower-dimensional latent space learned by an autoencoder. This approach significantly reduces computational cost compared to diffusion in pixel space, enabling high-resolution synthesis with greater efficiency. The models leverage a U-Net architecture for the diffusion process and can be conditioned on various inputs like text embeddings or retrieved image features, allowing for controllable and context-aware generation.

Quick Start & Requirements

  • Install: Create and activate a conda environment using conda env create -f environment.yaml and conda activate ldm.
  • Prerequisites: PyTorch, transformers, scann, kornia, torchmetrics, einops. Specific versions are noted for retrieval-augmented models.
  • Models: Pre-trained models for various tasks (text-to-image, inpainting, etc.) and datasets (ImageNet, LSUN, CelebA-HQ) are available for download via provided links and scripts (scripts/download_models.sh, scripts/download_first_stages.sh).
  • Demo: A web demo using Huggingface Spaces is available.

Highlighted Details

  • Supports text-conditional image synthesis with a 1.45B parameter model trained on LAION-400M.
  • Achieves a FID of 3.6 on ImageNet with classifier-free guidance.
  • Includes code for Retrieval-Augmented Diffusion Models (RDMs) for enhanced control and retrieval-based sampling.
  • Offers pre-trained autoencoders with varying latent space dimensions (f=4, 8, 16, 32) and regularization (VQ, KL), with reported rFID scores.

Maintenance & Community

The project is associated with the Ommer Lab at Heidelberg University. Key contributors include Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. The codebase builds upon OpenAI's ADM and lucidrains' denoising-diffusion-pytorch and x-transformers.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, the underlying components and the nature of the research suggest a focus on academic and research use. Commercial use would require careful review of any associated licenses for dependencies and pre-trained models.

Limitations & Caveats

The README mentions that for resolutions beyond 256x256, controllability is reduced. Some retrieval databases (e.g., OpenImages) are large (11GB+) and may require significant disk space and processing time for index creation. The ArtBench databases are noted as less effective for detailed text control.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
403 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
4 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.