stable-diffusion  by CompVis

Latent text-to-image diffusion model

Created 3 years ago
71,476 stars

Top 0.2% on SourcePulse

GitHubView on GitHub
Project Summary

Stable Diffusion is a latent text-to-image diffusion model that generates high-resolution images from textual prompts. It is designed for researchers and developers working on generative AI, offering a relatively lightweight architecture for text-to-image synthesis and image modification tasks.

How It Works

This model leverages a latent diffusion approach, operating in a lower-dimensional latent space to reduce computational requirements. It uses a frozen CLIP ViT-L/14 text encoder for conditioning on text prompts and an 860M UNet for the diffusion process. This architecture allows it to run on GPUs with at least 10GB VRAM, making it more accessible than models operating directly in pixel space.

Quick Start & Requirements

  • Installation: Create and activate a conda environment (conda env create -f environment.yaml, conda activate ldm), then install dependencies (pip install transformers==4.19.2 diffusers invisible-watermark, pip install -e .).
  • Prerequisites: Python, PyTorch, Transformers, Diffusers.
  • Usage: Run sampling scripts like python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse".
  • Diffusers Integration: Use StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4").to("cuda").
  • Resources: Requires a GPU with at least 10GB VRAM.

Highlighted Details

  • Offers multiple checkpoints (v1-1 to v1-4) with varying training steps and data filtering.
  • Includes a reference sampling script with a Safety Checker Module and invisible watermarking.
  • Supports image-to-image translation and upscaling via an img2img script.
  • Codebase builds upon OpenAI's ADM and lucidrains' denoising-diffusion-pytorch.

Maintenance & Community

  • Developed through a collaboration between CompVis, Stability AI, and Runway.
  • Weights are available via the CompVis organization on Hugging Face.
  • Diffusers integration is expected to see active community development.

Licensing & Compatibility

  • Licensed under the CreativeML OpenRAIL-M license, which permits commercial use but includes use-based restrictions to prevent misuse.
  • Users are advised to implement additional safety mechanisms for production services due to known limitations and biases.

Limitations & Caveats

The model mirrors biases present in its training data, potentially leading to unintended or harmful outputs. The provided weights are considered research artifacts, and ethical deployment requires careful consideration and ongoing research.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
223 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
7 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.