stablediffusion  by Stability-AI

Latent diffusion model for high-resolution image synthesis

Created 3 years ago
42,125 stars

Top 0.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Stable Diffusion models for high-resolution image synthesis, targeting researchers and developers interested in text-to-image generation. It offers various model versions, including SD 2.0 and 2.1, with capabilities for text-guided image generation, image modification, upscaling, and inpainting, built upon latent diffusion models.

How It Works

Stable Diffusion is a latent text-to-image diffusion model. It operates by conditioning a diffusion model on text embeddings derived from an OpenCLIP ViT-H/14 text encoder. The models are trained from scratch and utilize a v-prediction or noise-prediction approach within a U-Net architecture, operating in a lower-dimensional latent space for efficiency. This approach allows for high-resolution synthesis and various image manipulation tasks.

Quick Start & Requirements

  • Installation: pip install -e .
  • Dependencies: PyTorch 1.12.1, Transformers 4.19.2, Diffusers, invisible-watermark.
  • Recommended: xformers for efficient attention on GPUs. Installation involves cloning the xformers repository, compiling it (up to 30 min), and installing.
  • Intel CPU Optimization: Requires numactl, libjemalloc-dev, intel-openmp, intel_extension_for_pytorch.
  • Model Weights: Must be downloaded separately from Hugging Face.
  • Resources: Requires significant GPU VRAM for higher resolutions (e.g., 768x768).
  • Documentation: Hugging Face for models, Project Page for SD 2.1.

Highlighted Details

  • Offers SD 2.1 models at 768x768 and 512x512 resolutions.
  • Includes depth-guided diffusion for structure-preserving image modification.
  • Provides a text-guided x4 superresolution model.
  • Features a text-guided inpainting model.
  • Supports Intel® Extension for PyTorch* optimizations for CPU inference.

Maintenance & Community

The project acknowledges contributions from Hugging Face, LAION, and the DeepFloyd team. Codebase builds on OpenAI's ADM and lucidrains/denoising-diffusion-pytorch.

Licensing & Compatibility

  • Code License: MIT License.
  • Model Weights License: CreativeML Open RAIL++-M License. This license may have restrictions on commercial use and requires adherence to specific usage guidelines.

Limitations & Caveats

The models mirror biases present in their training data. The provided weights are research artifacts; use in production requires additional safety mechanisms. The README notes potential numerical instabilities with FP16 precision on the v2.1 model's vanilla attention module.

Health Check
Last Commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0%
3k
Multilingual text-to-image latent diffusion model
Created 3 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.1%
5k
Image synthesis research paper using a linear diffusion transformer
Created 1 year ago
Updated 1 day ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
57 more.

stable-diffusion by CompVis

0.3%
73k
Latent text-to-image diffusion model
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.