taesd  by madebyollin

Tiny AutoEncoder for Stable Diffusion latents

created 2 years ago
758 stars

Top 46.8% on sourcepulse

GitHubView on GitHub
Project Summary

TAESD is a highly optimized, tiny autoencoder designed to decode Stable Diffusion latents into full-size images with minimal computational cost. It targets users of Stable Diffusion, particularly those needing real-time previewing or efficient standalone VAE functionality, offering a substantial speedup over standard VAEs at a modest quality trade-off.

How It Works

TAESD is a distilled version of Stable Diffusion's VAE, featuring a significantly smaller encoder and decoder. It employs convolutional layers with ReLU activations and upsampling layers. This architecture allows it to process latents efficiently, trading fine detail fidelity for speed and a reduced parameter count, making it suitable for resource-constrained environments or real-time applications.

Quick Start & Requirements

  • Available via diffusers library (safetensors format: taesd, taesdxl, taesd3, taef1).
  • Can be integrated into A1111 and ComfyUI.
  • Requires Python. Specific hardware requirements depend on the integration and model size.
  • Official examples and integration guides are available in the README.

Highlighted Details

  • Drastically reduced parameter count: ~1.2M for encoder/decoder vs. ~34M/49M for SD VAE.
  • Compatible with SD1/2, SDXL, SD3, and FLUX.1 models.
  • Enables real-time previewing of image generation progress.
  • Can be used as a standalone VAE for interactive generation or image-space loss functions.
  • Latents can be clipped and quantized to 8-bit PNGs with minimal quality loss.

Maintenance & Community

The project is maintained by madebyollin. Integration into popular UIs like A1111 and ComfyUI suggests active community adoption.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

TAESD sacrifices fine detail quality for speed and size. While it offers a bounded receptive field, tiled decoding is still not recommended due to potential seam issues related to receptive field coverage.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
51 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.