onediff  by siliconflow

Acceleration library for diffusion models

Created 3 years ago
1,938 stars

Top 22.7% on SourcePulse

GitHubView on GitHub
Project Summary

OneDiff is an acceleration library designed to speed up diffusion models for users of popular UIs like ComfyUI and Hugging Face Diffusers. It offers PyTorch code compilation and optimized GPU kernels, aiming to provide significant performance gains with minimal code changes.

How It Works

OneDiff leverages PyTorch module compilation, specifically through its OneFlow backend or the optional Nexfort compiler. This process compiles PyTorch code into optimized kernels, reducing overhead from dynamic Python execution and enabling faster inference. The compilation can be done offline and the results loaded for online serving, supporting dynamic input shapes without recompilation penalties.

Quick Start & Requirements

  • Installation: python3 -m pip install --pre onediff (or from source for plugins).
  • Prerequisites: PyTorch, Hugging Face Diffusers, and a compiler backend (OneFlow or Nexfort). Requires NVIDIA GPUs (3090 RTX/4090 RTX/A100/A800/A10 etc.). CUDA 11.8, 12.1, or 12.2 are supported for OneFlow.
  • Setup: Installation involves installing PyTorch, diffusers, a compiler backend, and OneDiff itself.
  • Documentation: Documentation

Highlighted Details

  • Up to 1.7x speedup reported for Kolors, DiT, SD3, PixArt, and Latte models.
  • Supports acceleration for SD 1.5-XL, SDXL Turbo, LCM, LoRA, ControlNet, SVD, and InstantID.
  • Integrates with ComfyUI, Hugging Face Diffusers, and Stable Diffusion web UI.
  • Offers features like dynamic image size support and fast LoRA switching.

Maintenance & Community

  • Active development with recent updates for DiT and Kolors acceleration.
  • Community support via Discord and GitHub Issues.
  • Discord

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.

Limitations & Caveats

  • Windows support is limited to WSL. Compatibility with Ascend GPUs is in progress. Some features like SDXL DeepCache are in alpha status.
Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Jiaming Song Jiaming Song(Chief Scientist at Luma AI).

tomesd by dbolya

0.3%
1k
Speed-up tool for Stable Diffusion
Created 2 years ago
Updated 1 year ago
Starred by Chaoyu Yang Chaoyu Yang(Founder of Bento), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

nunchaku by nunchaku-tech

1.9%
3k
High-performance 4-bit diffusion model inference engine
Created 10 months ago
Updated 2 days ago
Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Cody Yu Cody Yu(Coauthor of vLLM; MTS at OpenAI).

xDiT by xdit-project

0.7%
2k
Inference engine for parallel Diffusion Transformer (DiT) deployment
Created 1 year ago
Updated 1 day ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.