Discover and explore top open-source AI tools and projects—updated daily.
Stability-AILatent diffusion model for high-resolution image synthesis
Top 0.7% on SourcePulse
This repository provides Stable Diffusion models for high-resolution image synthesis, targeting researchers and developers interested in text-to-image generation. It offers various model versions, including SD 2.0 and 2.1, with capabilities for text-guided image generation, image modification, upscaling, and inpainting, built upon latent diffusion models.
How It Works
Stable Diffusion is a latent text-to-image diffusion model. It operates by conditioning a diffusion model on text embeddings derived from an OpenCLIP ViT-H/14 text encoder. The models are trained from scratch and utilize a v-prediction or noise-prediction approach within a U-Net architecture, operating in a lower-dimensional latent space for efficiency. This approach allows for high-resolution synthesis and various image manipulation tasks.
Quick Start & Requirements
pip install -e .xformers for efficient attention on GPUs. Installation involves cloning the xformers repository, compiling it (up to 30 min), and installing.numactl, libjemalloc-dev, intel-openmp, intel_extension_for_pytorch.Highlighted Details
Maintenance & Community
The project acknowledges contributions from Hugging Face, LAION, and the DeepFloyd team. Codebase builds on OpenAI's ADM and lucidrains/denoising-diffusion-pytorch.
Licensing & Compatibility
Limitations & Caveats
The models mirror biases present in their training data. The provided weights are research artifacts; use in production requires additional safety mechanisms. The README notes potential numerical instabilities with FP16 precision on the v2.1 model's vanilla attention module.
4 months ago
1 week
albarji
afiaka87
NVlabs
ai-forever
Stability-AI
lucidrains
CompVis
CompVis