tomesd by dbolya

Speed-up tool for Stable Diffusion

Created 2 years ago

1,397 stars

Top 28.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Jiaming Song

Chief Scientist at Luma AI

Project Summary

ToMe for SD offers a method to accelerate Stable Diffusion inference by merging redundant tokens within transformer blocks, reducing computational load. This approach is designed for users of Stable Diffusion models seeking faster generation times and lower memory consumption without requiring model retraining.

How It Works

ToMe for SD applies a novel token merging strategy to Stable Diffusion's transformer components. By intelligently merging tokens, it reduces the number of operations the model performs, leading to significant speedups and memory savings. This method is designed to minimize quality degradation, even with aggressive merging ratios, and can be combined with other optimization techniques like xFormers.

Quick Start & Requirements

Install via pip: pip install tomesd
Requires PyTorch >= 1.12.1.
Supports Stable Diffusion v1, v2, Latent Diffusion, and Diffusers pipelines.
Setup is minimal, involving a simple Python patch.
Official documentation and examples are available.

Highlighted Details

Achieves up to 2x speedup and ~5.7x less memory usage with 60% token merging.
Minimal quality loss, with FID scores remaining close to baseline even at high merge ratios.
Implemented in pure Python, requiring no CUDA compilation.
Compatible with efficient transformer implementations like xFormers and Flash Attention.

Maintenance & Community

The project is actively maintained by Daniel Bolya and Judy Hoffman.
Available via pip since April 2023, with ongoing support for Diffusers.
Citations are provided for both the Stable Diffusion specific work and the original ToMe paper.

Licensing & Compatibility

The repository does not explicitly state a license in the README.
Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

The process is lossy, meaning generated images will differ from the original.
Initial inference speed-up may be less pronounced due to PyTorch graph setup.
Consistent results across batches require manual seed setting due to the random nature of the merging process.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days