stable-diffusion by CompVis

Latent text-to-image diffusion model

Created 3 years ago

72,158 stars

Top 0.2% on SourcePulse

View on GitHub

59 Experts Love This Project

Dan Abramov

Core Contributor to React; Coauthor of Redux, Create React App

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Benjamin Bolte

Cofounder of K-Scale Labs

Tomas Valenta

Cofounder of E2B

and 55 more!

Project Summary

Stable Diffusion is a latent text-to-image diffusion model that generates high-resolution images from textual prompts. It is designed for researchers and developers working on generative AI, offering a relatively lightweight architecture for text-to-image synthesis and image modification tasks.

How It Works

This model leverages a latent diffusion approach, operating in a lower-dimensional latent space to reduce computational requirements. It uses a frozen CLIP ViT-L/14 text encoder for conditioning on text prompts and an 860M UNet for the diffusion process. This architecture allows it to run on GPUs with at least 10GB VRAM, making it more accessible than models operating directly in pixel space.

Quick Start & Requirements

Installation: Create and activate a conda environment (conda env create -f environment.yaml, conda activate ldm), then install dependencies (pip install transformers==4.19.2 diffusers invisible-watermark, pip install -e .).
Prerequisites: Python, PyTorch, Transformers, Diffusers.
Usage: Run sampling scripts like python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse".
Diffusers Integration: Use StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4").to("cuda").
Resources: Requires a GPU with at least 10GB VRAM.

Highlighted Details

Offers multiple checkpoints (v1-1 to v1-4) with varying training steps and data filtering.
Includes a reference sampling script with a Safety Checker Module and invisible watermarking.
Supports image-to-image translation and upscaling via an img2img script.
Codebase builds upon OpenAI's ADM and lucidrains' denoising-diffusion-pytorch.

Maintenance & Community

Developed through a collaboration between CompVis, Stability AI, and Runway.
Weights are available via the CompVis organization on Hugging Face.
Diffusers integration is expected to see active community development.

Licensing & Compatibility

Licensed under the CreativeML OpenRAIL-M license, which permits commercial use but includes use-based restrictions to prevent misuse.
Users are advised to implement additional safety mechanisms for production services due to known limitations and biases.

Limitations & Caveats

The model mirrors biases present in its training data, potentially leading to unintended or harmful outputs. The provided weights are considered research artifacts, and ethical deployment requires careful consideration and ongoing research.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

266 stars in the last 30 days