taesd  by madebyollin

Tiny AutoEncoder for Stable Diffusion latents

Created 2 years ago
779 stars

Top 44.9% on SourcePulse

GitHubView on GitHub
Project Summary

TAESD is a highly optimized, tiny autoencoder designed to decode Stable Diffusion latents into full-size images with minimal computational cost. It targets users of Stable Diffusion, particularly those needing real-time previewing or efficient standalone VAE functionality, offering a substantial speedup over standard VAEs at a modest quality trade-off.

How It Works

TAESD is a distilled version of Stable Diffusion's VAE, featuring a significantly smaller encoder and decoder. It employs convolutional layers with ReLU activations and upsampling layers. This architecture allows it to process latents efficiently, trading fine detail fidelity for speed and a reduced parameter count, making it suitable for resource-constrained environments or real-time applications.

Quick Start & Requirements

  • Available via diffusers library (safetensors format: taesd, taesdxl, taesd3, taef1).
  • Can be integrated into A1111 and ComfyUI.
  • Requires Python. Specific hardware requirements depend on the integration and model size.
  • Official examples and integration guides are available in the README.

Highlighted Details

  • Drastically reduced parameter count: ~1.2M for encoder/decoder vs. ~34M/49M for SD VAE.
  • Compatible with SD1/2, SDXL, SD3, and FLUX.1 models.
  • Enables real-time previewing of image generation progress.
  • Can be used as a standalone VAE for interactive generation or image-space loss functions.
  • Latents can be clipped and quantized to 8-bit PNGs with minimal quality loss.

Maintenance & Community

The project is maintained by madebyollin. Integration into popular UIs like A1111 and ComfyUI suggests active community adoption.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

TAESD sacrifices fine detail quality for speed and size. While it offers a bounded receptive field, tiled decoding is still not recommended due to potential seam issues related to receptive field coverage.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
13 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Robin Rombach Robin Rombach(Cofounder of Black Forest Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.