StableCascade  by Stability-AI

Image generation model using cascaded diffusion

created 1 year ago
6,594 stars

Top 7.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Stable Cascade is an official codebase for a text-to-image diffusion model built on the Würstchen architecture, targeting researchers and developers seeking efficient and high-quality image generation. It achieves significant speedups and reduced training costs by operating in a highly compressed latent space (42x compression factor), outperforming models like Stable Diffusion XL in prompt alignment and aesthetic quality.

How It Works

Stable Cascade employs a three-stage cascade: Stage A (VAE) and Stage B compress images into a small 24x24 latent space, while Stage C (diffusion model) generates these latents from text prompts. This approach allows for faster inference and cheaper training compared to models with larger latent spaces, while maintaining high-fidelity reconstructions.

Quick Start & Requirements

  • Install via pip install gradio accelerate and pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3.
  • Run the Gradio app with PYTHONPATH=./ python3 gradio_app/app.py.
  • Official documentation and usage examples are available in the 🤗 diffusers library.

Highlighted Details

  • Achieves superior prompt alignment and aesthetic quality in human evaluations against models like SDXL and Playground v2.
  • Offers faster inference times despite a larger parameter count than SDXL.
  • Supports extensions like finetuning, LoRA, ControlNet, IP-Adapter, and LCM.
  • Provides training scripts for the model, ControlNet, and LoRA.
  • Includes a diffusion autoencoder (Stage A & B) for custom model training in a compressed space.

Maintenance & Community

The codebase is in early development, with potential for future updates and optimizations based on community interest. Feedback and contributions are welcomed.

Licensing & Compatibility

The code is released under the MIT LICENSE. Model weights are under a STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE, restricting commercial use.

Limitations & Caveats

The codebase is in early development and may contain errors or unoptimized code. The model weights are restricted to non-commercial and research community use.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
25 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.