tomesd  by dbolya

Speed-up tool for Stable Diffusion

Created 2 years ago
1,372 stars

Top 29.4% on SourcePulse

GitHubView on GitHub
Project Summary

ToMe for SD offers a method to accelerate Stable Diffusion inference by merging redundant tokens within transformer blocks, reducing computational load. This approach is designed for users of Stable Diffusion models seeking faster generation times and lower memory consumption without requiring model retraining.

How It Works

ToMe for SD applies a novel token merging strategy to Stable Diffusion's transformer components. By intelligently merging tokens, it reduces the number of operations the model performs, leading to significant speedups and memory savings. This method is designed to minimize quality degradation, even with aggressive merging ratios, and can be combined with other optimization techniques like xFormers.

Quick Start & Requirements

  • Install via pip: pip install tomesd
  • Requires PyTorch >= 1.12.1.
  • Supports Stable Diffusion v1, v2, Latent Diffusion, and Diffusers pipelines.
  • Setup is minimal, involving a simple Python patch.
  • Official documentation and examples are available.

Highlighted Details

  • Achieves up to 2x speedup and ~5.7x less memory usage with 60% token merging.
  • Minimal quality loss, with FID scores remaining close to baseline even at high merge ratios.
  • Implemented in pure Python, requiring no CUDA compilation.
  • Compatible with efficient transformer implementations like xFormers and Flash Attention.

Maintenance & Community

  • The project is actively maintained by Daniel Bolya and Judy Hoffman.
  • Available via pip since April 2023, with ongoing support for Diffusers.
  • Citations are provided for both the Stable Diffusion specific work and the original ToMe paper.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.
  • Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

  • The process is lossy, meaning generated images will differ from the original.
  • Initial inference speed-up may be less pronounced due to PyTorch graph setup.
  • Consistent results across batches require manual seed setting due to the random nature of the merging process.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), and
3 more.

taesd by madebyollin

0.3%
779
Tiny AutoEncoder for Stable Diffusion latents
Created 2 years ago
Updated 5 months ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
1 more.

diffusion by mosaicml

0%
707
Diffusion model training code
Created 2 years ago
Updated 8 months ago
Starred by Chaoyu Yang Chaoyu Yang(Founder of Bento), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

nunchaku by nunchaku-tech

1.9%
3k
High-performance 4-bit diffusion model inference engine
Created 10 months ago
Updated 2 days ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

matmulfreellm by ridgerchu

0.0%
3k
MatMul-free language models
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.