stable-diffusion  by pesser

Latent diffusion model research paper

created 3 years ago
1,036 stars

Top 36.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the codebase for Latent Diffusion Models (LDMs), a novel approach to high-resolution image synthesis. It enables text-to-image generation, inpainting, and class-conditional synthesis, targeting researchers and practitioners in computer vision and generative AI. The primary benefit is achieving state-of-the-art image quality with significantly reduced computational cost compared to previous diffusion models.

How It Works

LDMs operate in a lower-dimensional latent space, learned by an autoencoder. This latent space representation allows the diffusion process to operate on smaller feature maps, drastically reducing computational requirements for training and inference. The model then decodes the generated latent representation back into a high-resolution image. This approach offers a favorable trade-off between computational efficiency and generative quality.

Quick Start & Requirements

  • Install: Create and activate a conda environment using conda env create -f environment.yaml and conda activate ldm.
  • Prerequisites: Requires a suitable conda environment. Pre-trained models are available for download.
  • Demo: A web demo is available via Huggingface Spaces.
  • Docs: Official documentation and a Colab notebook are linked for further guidance.

Highlighted Details

  • Offers text-to-image synthesis with controllable sampling parameters (scale, ddim_steps, ddim_eta).
  • Supports inpainting tasks with provided example data and scripts.
  • Includes pre-trained models for various datasets (ImageNet, LSUN, CelebA-HQ, FFHQ) and tasks, with reported FID scores.
  • Enables training of custom LDMs and autoencoders with provided configuration files.

Maintenance & Community

The project is associated with CompVis and has contributions from Katherine Crowson. A Colab notebook is provided for easy experimentation.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it builds upon OpenAI's ADM codebase and lucidrains' denoising-diffusion-pytorch and x-transformers, which may have their own licenses. Users should verify licensing for commercial use.

Limitations & Caveats

The README explicitly states that this is the development repository and directs users to CompVis/stable-diffusion for the Stable Diffusion release. Some advanced features or specific model configurations might be experimental or require further development.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
4 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.