stable-diffusion.cpp by leejet

C/C++ inference for Stable Diffusion models

Created 2 years ago

5,119 stars

Top 9.7% on SourcePulse

View on GitHub

9 Experts Love This Project

Cofounder of Replicate

Salvatore Sanfilippo

Author of Redis

and 5 more!

Project Summary

This project provides a pure C/C++ implementation for Stable Diffusion and Flux inference, targeting developers and power users seeking a lightweight, dependency-free solution. It offers broad model support (SD1.x, SD2.x, SDXL, SD3/3.5, Flux), various quantization levels (2-bit to 8-bit integer), and extensive hardware acceleration options including CPU (AVX, AVX2, AVX512), CUDA, Metal, Vulkan, and SYCL.

How It Works

Built upon the ggml library, similar to llama.cpp, this project prioritizes efficiency and minimal dependencies. It supports loading models directly from common formats like .ckpt, .safetensors, and Hugging Face diffusers, eliminating the need for conversion to .ggml or .gguf. Key features include Flash Attention for memory optimization, LoRA support, Latent Consistency Models (LCM), and ESRGAN upscaling.

Quick Start & Requirements

Install/Run: Download pre-built executables from releases or build from source using CMake.
Prerequisites: CUDA Toolkit (for CUDA acceleration), ROCm toolkit (for HipBLAS), Vulkan SDK, Intel® oneAPI Base toolkit (for SYCL). Recommended VRAM: 4GB+.
Setup: Building from source involves cmake and cmake --build.
Docs: https://github.com/leejet/stable-diffusion.cpp

Highlighted Details

Supports SDXL VAE FP16 NaN fix.
Offers 2-bit, 3-bit, 4-bit, 5-bit, and 8-bit integer quantization.
Enables GPU acceleration via CUDA, Metal, Vulkan, and SYCL.
Integrates ControlNet, PhotoMaker, and TAESD for faster decoding.
Embeds generation parameters into PNG output compatible with stable-diffusion-webui.

Maintenance & Community

The project actively lists contributors and references related projects like ggml, stable-diffusion-webui, and ComfyUI.

Licensing & Compatibility

The project is licensed under the MIT License, allowing for commercial use and integration with closed-source applications.

Limitations & Caveats

The README notes that the current ggml_conv_2d implementation is slow and memory-intensive, with ongoing efforts to optimize it. Metal backend currently has inefficiencies with very large matrices. Flash Attention may lower quality for some backends and can cause crashes if unsupported. Inpainting support is listed as a future TODO.

Health Check

Last Commit

9 hours ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

294 stars in the last 30 days