stable-diffusion  by CompVis

Latent text-to-image diffusion model

created 3 years ago
71,230 stars

Top 0.2% on sourcepulse

GitHubView on GitHub
Project Summary

Stable Diffusion is a latent text-to-image diffusion model that generates high-resolution images from textual prompts. It is designed for researchers and developers working on generative AI, offering a relatively lightweight architecture for text-to-image synthesis and image modification tasks.

How It Works

This model leverages a latent diffusion approach, operating in a lower-dimensional latent space to reduce computational requirements. It uses a frozen CLIP ViT-L/14 text encoder for conditioning on text prompts and an 860M UNet for the diffusion process. This architecture allows it to run on GPUs with at least 10GB VRAM, making it more accessible than models operating directly in pixel space.

Quick Start & Requirements

  • Installation: Create and activate a conda environment (conda env create -f environment.yaml, conda activate ldm), then install dependencies (pip install transformers==4.19.2 diffusers invisible-watermark, pip install -e .).
  • Prerequisites: Python, PyTorch, Transformers, Diffusers.
  • Usage: Run sampling scripts like python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse".
  • Diffusers Integration: Use StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4").to("cuda").
  • Resources: Requires a GPU with at least 10GB VRAM.

Highlighted Details

  • Offers multiple checkpoints (v1-1 to v1-4) with varying training steps and data filtering.
  • Includes a reference sampling script with a Safety Checker Module and invisible watermarking.
  • Supports image-to-image translation and upscaling via an img2img script.
  • Codebase builds upon OpenAI's ADM and lucidrains' denoising-diffusion-pytorch.

Maintenance & Community

  • Developed through a collaboration between CompVis, Stability AI, and Runway.
  • Weights are available via the CompVis organization on Hugging Face.
  • Diffusers integration is expected to see active community development.

Licensing & Compatibility

  • Licensed under the CreativeML OpenRAIL-M license, which permits commercial use but includes use-based restrictions to prevent misuse.
  • Users are advised to implement additional safety mechanisms for production services due to known limitations and biases.

Limitations & Caveats

The model mirrors biases present in its training data, potentially leading to unintended or harmful outputs. The provided weights are considered research artifacts, and ethical deployment requires careful consideration and ongoing research.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
983 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.