Wuerstchen by dome272

Text-to-image research paper on efficient diffusion model pretraining

Created 2 years ago

556 stars

Top 57.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

Omar Sanseviero

DevRel at Google DeepMind

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

Würstchen is an efficient framework for training text-to-image diffusion models by operating in a highly compressed latent space. It targets researchers and developers seeking faster, more computationally affordable training of high-resolution image generation models, achieving significant compression with faithful reconstruction.

How It Works

Würstchen employs a three-stage compression approach (Stages A, B, and C). Stages A and B progressively compress images into a very low-dimensional latent space (achieving a 42x compression factor). Stage C then learns the text-conditional generation within this compressed space. This strategy drastically reduces computational requirements for training Stage C, enabling faster and cheaper model development.

Quick Start & Requirements

Install via pip: pip install -U transformers accelerate diffusers
Requires PyTorch and CUDA-enabled GPU for optimal performance.
Usage example and official documentation available: https://huggingface.co/docs/diffusers/main/en/api/pipelines/wuerstchen

Highlighted Details

Achieves a 42x compression factor while maintaining faithful image reconstruction.
Enables training of the text-conditional stage (Stage C) in a significantly smaller 12x12 latent space.
Fully integrated into the Hugging Face diffusers library for easy adoption.
Offers pre-trained models (v1 and v2) supporting resolutions up to 1024x1024.

Maintenance & Community

Developed by dome272, with acknowledgments to Stability AI for compute resources.
Paper accepted to ICLR 2024 (oral presentation).
Citation details provided for academic use.

Licensing & Compatibility

No explicit license is mentioned in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the project's license, which is a critical factor for commercial adoption or integration into closed-source projects. Further investigation into licensing is required.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days