StableCascade by Stability-AI

Image generation model using cascaded diffusion

Created 1 year ago

6,585 stars

Top 7.7% on SourcePulse

View on GitHub

4 Experts Love This Project

Stella Rose Biderman

Executive Director at EleutherAI

Chief Scientist at Luma AI

Project Summary

Stable Cascade is an official codebase for a text-to-image diffusion model built on the Würstchen architecture, targeting researchers and developers seeking efficient and high-quality image generation. It achieves significant speedups and reduced training costs by operating in a highly compressed latent space (42x compression factor), outperforming models like Stable Diffusion XL in prompt alignment and aesthetic quality.

How It Works

Stable Cascade employs a three-stage cascade: Stage A (VAE) and Stage B compress images into a small 24x24 latent space, while Stage C (diffusion model) generates these latents from text prompts. This approach allows for faster inference and cheaper training compared to models with larger latent spaces, while maintaining high-fidelity reconstructions.

Quick Start & Requirements

Install via pip install gradio accelerate and pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3.
Run the Gradio app with PYTHONPATH=./ python3 gradio_app/app.py.
Official documentation and usage examples are available in the 🤗 diffusers library.

Highlighted Details

Achieves superior prompt alignment and aesthetic quality in human evaluations against models like SDXL and Playground v2.
Offers faster inference times despite a larger parameter count than SDXL.
Supports extensions like finetuning, LoRA, ControlNet, IP-Adapter, and LCM.
Provides training scripts for the model, ControlNet, and LoRA.
Includes a diffusion autoencoder (Stage A & B) for custom model training in a compressed space.

Maintenance & Community

The codebase is in early development, with potential for future updates and optimizations based on community interest. Feedback and contributions are welcomed.

Licensing & Compatibility

The code is released under the MIT LICENSE. Model weights are under a STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE, restricting commercial use.

Limitations & Caveats

The codebase is in early development and may contain errors or unoptimized code. The model weights are restricted to non-commercial and research community use.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days