Latent text-to-image diffusion model
Top 0.2% on sourcepulse
Stable Diffusion is a latent text-to-image diffusion model that generates high-resolution images from textual prompts. It is designed for researchers and developers working on generative AI, offering a relatively lightweight architecture for text-to-image synthesis and image modification tasks.
How It Works
This model leverages a latent diffusion approach, operating in a lower-dimensional latent space to reduce computational requirements. It uses a frozen CLIP ViT-L/14 text encoder for conditioning on text prompts and an 860M UNet for the diffusion process. This architecture allows it to run on GPUs with at least 10GB VRAM, making it more accessible than models operating directly in pixel space.
Quick Start & Requirements
conda env create -f environment.yaml
, conda activate ldm
), then install dependencies (pip install transformers==4.19.2 diffusers invisible-watermark
, pip install -e .
).python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse"
.StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4").to("cuda")
.Highlighted Details
img2img
script.denoising-diffusion-pytorch
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model mirrors biases present in its training data, potentially leading to unintended or harmful outputs. The provided weights are considered research artifacts, and ethical deployment requires careful consideration and ongoing research.
1 year ago
1 week