taesd by madebyollin

Tiny AutoEncoder for Stable Diffusion latents

Created 2 years ago

854 stars

Top 41.9% on SourcePulse

View on GitHub

5 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Christian Laforte

Distinguished Engineer at NVIDIA; Former CTO at Stability AI

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Jiaming Song

Chief Scientist at Luma AI

and 1 more!

Project Summary

TAESD is a highly optimized, tiny autoencoder designed to decode Stable Diffusion latents into full-size images with minimal computational cost. It targets users of Stable Diffusion, particularly those needing real-time previewing or efficient standalone VAE functionality, offering a substantial speedup over standard VAEs at a modest quality trade-off.

How It Works

TAESD is a distilled version of Stable Diffusion's VAE, featuring a significantly smaller encoder and decoder. It employs convolutional layers with ReLU activations and upsampling layers. This architecture allows it to process latents efficiently, trading fine detail fidelity for speed and a reduced parameter count, making it suitable for resource-constrained environments or real-time applications.

Quick Start & Requirements

Available via diffusers library (safetensors format: taesd, taesdxl, taesd3, taef1).
Can be integrated into A1111 and ComfyUI.
Requires Python. Specific hardware requirements depend on the integration and model size.
Official examples and integration guides are available in the README.

Highlighted Details

Drastically reduced parameter count: ~1.2M for encoder/decoder vs. ~34M/49M for SD VAE.
Compatible with SD1/2, SDXL, SD3, and FLUX.1 models.
Enables real-time previewing of image generation progress.
Can be used as a standalone VAE for interactive generation or image-space loss functions.
Latents can be clipped and quantized to 8-bit PNGs with minimal quality loss.

Maintenance & Community

The project is maintained by madebyollin. Integration into popular UIs like A1111 and ComfyUI suggests active community adoption.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

TAESD sacrifices fine detail quality for speed and size. While it offers a bounded receptive field, tiled decoding is still not recommended due to potential seam issues related to receptive field coverage.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days