Image generation model using cascaded diffusion
Top 7.9% on sourcepulse
Stable Cascade is an official codebase for a text-to-image diffusion model built on the Würstchen architecture, targeting researchers and developers seeking efficient and high-quality image generation. It achieves significant speedups and reduced training costs by operating in a highly compressed latent space (42x compression factor), outperforming models like Stable Diffusion XL in prompt alignment and aesthetic quality.
How It Works
Stable Cascade employs a three-stage cascade: Stage A (VAE) and Stage B compress images into a small 24x24 latent space, while Stage C (diffusion model) generates these latents from text prompts. This approach allows for faster inference and cheaper training compared to models with larger latent spaces, while maintaining high-fidelity reconstructions.
Quick Start & Requirements
pip install gradio accelerate
and pip install git+https://github.com/kashif/diffusers.git@wuerstchen-v3
.PYTHONPATH=./ python3 gradio_app/app.py
.Highlighted Details
Maintenance & Community
The codebase is in early development, with potential for future updates and optimizations based on community interest. Feedback and contributions are welcomed.
Licensing & Compatibility
The code is released under the MIT LICENSE. Model weights are under a STABILITY AI NON-COMMERCIAL RESEARCH COMMUNITY LICENSE, restricting commercial use.
Limitations & Caveats
The codebase is in early development and may contain errors or unoptimized code. The model weights are restricted to non-commercial and research community use.
1 year ago
Inactive