stablediffusion  by Stability-AI

Latent diffusion model for high-resolution image synthesis

Created 2 years ago
41,756 stars

Top 0.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Stable Diffusion models for high-resolution image synthesis, targeting researchers and developers interested in text-to-image generation. It offers various model versions, including SD 2.0 and 2.1, with capabilities for text-guided image generation, image modification, upscaling, and inpainting, built upon latent diffusion models.

How It Works

Stable Diffusion is a latent text-to-image diffusion model. It operates by conditioning a diffusion model on text embeddings derived from an OpenCLIP ViT-H/14 text encoder. The models are trained from scratch and utilize a v-prediction or noise-prediction approach within a U-Net architecture, operating in a lower-dimensional latent space for efficiency. This approach allows for high-resolution synthesis and various image manipulation tasks.

Quick Start & Requirements

  • Installation: pip install -e .
  • Dependencies: PyTorch 1.12.1, Transformers 4.19.2, Diffusers, invisible-watermark.
  • Recommended: xformers for efficient attention on GPUs. Installation involves cloning the xformers repository, compiling it (up to 30 min), and installing.
  • Intel CPU Optimization: Requires numactl, libjemalloc-dev, intel-openmp, intel_extension_for_pytorch.
  • Model Weights: Must be downloaded separately from Hugging Face.
  • Resources: Requires significant GPU VRAM for higher resolutions (e.g., 768x768).
  • Documentation: Hugging Face for models, Project Page for SD 2.1.

Highlighted Details

  • Offers SD 2.1 models at 768x768 and 512x512 resolutions.
  • Includes depth-guided diffusion for structure-preserving image modification.
  • Provides a text-guided x4 superresolution model.
  • Features a text-guided inpainting model.
  • Supports Intel® Extension for PyTorch* optimizations for CPU inference.

Maintenance & Community

The project acknowledges contributions from Hugging Face, LAION, and the DeepFloyd team. Codebase builds on OpenAI's ADM and lucidrains/denoising-diffusion-pytorch.

Licensing & Compatibility

  • Code License: MIT License.
  • Model Weights License: CreativeML Open RAIL++-M License. This license may have restrictions on commercial use and requires adherence to specific usage guidelines.

Limitations & Caveats

The models mirror biases present in their training data. The provided weights are research artifacts; use in production requires additional safety mechanisms. The README notes potential numerical instabilities with FP16 precision on the v2.1 model's vanilla attention module.

Health Check
Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
223 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
57 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.