stable-diffusion  by pesser

Latent diffusion model research paper

Created 3 years ago
1,035 stars

Top 36.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the codebase for Latent Diffusion Models (LDMs), a novel approach to high-resolution image synthesis. It enables text-to-image generation, inpainting, and class-conditional synthesis, targeting researchers and practitioners in computer vision and generative AI. The primary benefit is achieving state-of-the-art image quality with significantly reduced computational cost compared to previous diffusion models.

How It Works

LDMs operate in a lower-dimensional latent space, learned by an autoencoder. This latent space representation allows the diffusion process to operate on smaller feature maps, drastically reducing computational requirements for training and inference. The model then decodes the generated latent representation back into a high-resolution image. This approach offers a favorable trade-off between computational efficiency and generative quality.

Quick Start & Requirements

  • Install: Create and activate a conda environment using conda env create -f environment.yaml and conda activate ldm.
  • Prerequisites: Requires a suitable conda environment. Pre-trained models are available for download.
  • Demo: A web demo is available via Huggingface Spaces.
  • Docs: Official documentation and a Colab notebook are linked for further guidance.

Highlighted Details

  • Offers text-to-image synthesis with controllable sampling parameters (scale, ddim_steps, ddim_eta).
  • Supports inpainting tasks with provided example data and scripts.
  • Includes pre-trained models for various datasets (ImageNet, LSUN, CelebA-HQ, FFHQ) and tasks, with reported FID scores.
  • Enables training of custom LDMs and autoencoders with provided configuration files.

Maintenance & Community

The project is associated with CompVis and has contributions from Katherine Crowson. A Colab notebook is provided for easy experimentation.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it builds upon OpenAI's ADM codebase and lucidrains' denoising-diffusion-pytorch and x-transformers, which may have their own licenses. Users should verify licensing for commercial use.

Limitations & Caveats

The README explicitly states that this is the development repository and directs users to CompVis/stable-diffusion for the Stable Diffusion release. Some advanced features or specific model configurations might be experimental or require further development.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.