stable-diffusion by pesser

Latent diffusion model research paper

Created 3 years ago

1,033 stars

Top 36.3% on SourcePulse

View on GitHub

6 Experts Love This Project

Paras Jain

Cofounder of Genmo

Omar Sanseviero

DevRel at Google DeepMind

Chaoyu Yang

Founder of Bento

Junyang Lin

Core Maintainer at Alibaba Qwen

and 2 more!

Project Summary

This repository provides the codebase for Latent Diffusion Models (LDMs), a novel approach to high-resolution image synthesis. It enables text-to-image generation, inpainting, and class-conditional synthesis, targeting researchers and practitioners in computer vision and generative AI. The primary benefit is achieving state-of-the-art image quality with significantly reduced computational cost compared to previous diffusion models.

How It Works

LDMs operate in a lower-dimensional latent space, learned by an autoencoder. This latent space representation allows the diffusion process to operate on smaller feature maps, drastically reducing computational requirements for training and inference. The model then decodes the generated latent representation back into a high-resolution image. This approach offers a favorable trade-off between computational efficiency and generative quality.

Quick Start & Requirements

Install: Create and activate a conda environment using conda env create -f environment.yaml and conda activate ldm.
Prerequisites: Requires a suitable conda environment. Pre-trained models are available for download.
Demo: A web demo is available via Huggingface Spaces.
Docs: Official documentation and a Colab notebook are linked for further guidance.

Highlighted Details

Offers text-to-image synthesis with controllable sampling parameters (scale, ddim_steps, ddim_eta).
Supports inpainting tasks with provided example data and scripts.
Includes pre-trained models for various datasets (ImageNet, LSUN, CelebA-HQ, FFHQ) and tasks, with reported FID scores.
Enables training of custom LDMs and autoencoders with provided configuration files.

Maintenance & Community

The project is associated with CompVis and has contributions from Katherine Crowson. A Colab notebook is provided for easy experimentation.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it builds upon OpenAI's ADM codebase and lucidrains' denoising-diffusion-pytorch and x-transformers, which may have their own licenses. Users should verify licensing for commercial use.

Limitations & Caveats

The README explicitly states that this is the development repository and directs users to CompVis/stable-diffusion for the Stable Diffusion release. Some advanced features or specific model configurations might be experimental or require further development.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days