Wuerstchen  by dome272

Text-to-image research paper on efficient diffusion model pretraining

Created 2 years ago
553 stars

Top 57.9% on SourcePulse

GitHubView on GitHub
Project Summary

Würstchen is an efficient framework for training text-to-image diffusion models by operating in a highly compressed latent space. It targets researchers and developers seeking faster, more computationally affordable training of high-resolution image generation models, achieving significant compression with faithful reconstruction.

How It Works

Würstchen employs a three-stage compression approach (Stages A, B, and C). Stages A and B progressively compress images into a very low-dimensional latent space (achieving a 42x compression factor). Stage C then learns the text-conditional generation within this compressed space. This strategy drastically reduces computational requirements for training Stage C, enabling faster and cheaper model development.

Quick Start & Requirements

Highlighted Details

  • Achieves a 42x compression factor while maintaining faithful image reconstruction.
  • Enables training of the text-conditional stage (Stage C) in a significantly smaller 12x12 latent space.
  • Fully integrated into the Hugging Face diffusers library for easy adoption.
  • Offers pre-trained models (v1 and v2) supporting resolutions up to 1024x1024.

Maintenance & Community

  • Developed by dome272, with acknowledgments to Stability AI for compute resources.
  • Paper accepted to ICLR 2024 (oral presentation).
  • Citation details provided for academic use.

Licensing & Compatibility

  • No explicit license is mentioned in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify the project's license, which is a critical factor for commercial adoption or integration into closed-source projects. Further investigation into licensing is required.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
11 more.

IF by deep-floyd

0.0%
8k
Text-to-image model for photorealistic synthesis and language understanding
Created 2 years ago
Updated 1 year ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
57 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.