stable-diffusion.cpp  by leejet

C/C++ inference for Stable Diffusion models

created 2 years ago
4,270 stars

Top 11.7% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a pure C/C++ implementation for Stable Diffusion and Flux inference, targeting developers and power users seeking a lightweight, dependency-free solution. It offers broad model support (SD1.x, SD2.x, SDXL, SD3/3.5, Flux), various quantization levels (2-bit to 8-bit integer), and extensive hardware acceleration options including CPU (AVX, AVX2, AVX512), CUDA, Metal, Vulkan, and SYCL.

How It Works

Built upon the ggml library, similar to llama.cpp, this project prioritizes efficiency and minimal dependencies. It supports loading models directly from common formats like .ckpt, .safetensors, and Hugging Face diffusers, eliminating the need for conversion to .ggml or .gguf. Key features include Flash Attention for memory optimization, LoRA support, Latent Consistency Models (LCM), and ESRGAN upscaling.

Quick Start & Requirements

  • Install/Run: Download pre-built executables from releases or build from source using CMake.
  • Prerequisites: CUDA Toolkit (for CUDA acceleration), ROCm toolkit (for HipBLAS), Vulkan SDK, Intel® oneAPI Base toolkit (for SYCL). Recommended VRAM: 4GB+.
  • Setup: Building from source involves cmake and cmake --build.
  • Docs: https://github.com/leejet/stable-diffusion.cpp

Highlighted Details

  • Supports SDXL VAE FP16 NaN fix.
  • Offers 2-bit, 3-bit, 4-bit, 5-bit, and 8-bit integer quantization.
  • Enables GPU acceleration via CUDA, Metal, Vulkan, and SYCL.
  • Integrates ControlNet, PhotoMaker, and TAESD for faster decoding.
  • Embeds generation parameters into PNG output compatible with stable-diffusion-webui.

Maintenance & Community

The project actively lists contributors and references related projects like ggml, stable-diffusion-webui, and ComfyUI.

Licensing & Compatibility

The project is licensed under the MIT License, allowing for commercial use and integration with closed-source applications.

Limitations & Caveats

The README notes that the current ggml_conv_2d implementation is slow and memory-intensive, with ongoing efforts to optimize it. Metal backend currently has inefficiencies with very large matrices. Flash Attention may lower quality for some backends and can cause crashes if unsupported. Inpainting support is listed as a future TODO.

Health Check
Last commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
21
Issues (30d)
16
Star History
239 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 13 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 13 hours ago
Feedback? Help us improve.