Wuerstchen  by dome272

Text-to-image research paper on efficient diffusion model pretraining

created 2 years ago
549 stars

Top 59.0% on sourcepulse

GitHubView on GitHub
Project Summary

Würstchen is an efficient framework for training text-to-image diffusion models by operating in a highly compressed latent space. It targets researchers and developers seeking faster, more computationally affordable training of high-resolution image generation models, achieving significant compression with faithful reconstruction.

How It Works

Würstchen employs a three-stage compression approach (Stages A, B, and C). Stages A and B progressively compress images into a very low-dimensional latent space (achieving a 42x compression factor). Stage C then learns the text-conditional generation within this compressed space. This strategy drastically reduces computational requirements for training Stage C, enabling faster and cheaper model development.

Quick Start & Requirements

Highlighted Details

  • Achieves a 42x compression factor while maintaining faithful image reconstruction.
  • Enables training of the text-conditional stage (Stage C) in a significantly smaller 12x12 latent space.
  • Fully integrated into the Hugging Face diffusers library for easy adoption.
  • Offers pre-trained models (v1 and v2) supporting resolutions up to 1024x1024.

Maintenance & Community

  • Developed by dome272, with acknowledgments to Stability AI for compute resources.
  • Paper accepted to ICLR 2024 (oral presentation).
  • Citation details provided for academic use.

Licensing & Compatibility

  • No explicit license is mentioned in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the project's license, which is a critical factor for commercial adoption or integration into closed-source projects. Further investigation into licensing is required.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.