C/C++ inference for Stable Diffusion models
Top 11.7% on sourcepulse
This project provides a pure C/C++ implementation for Stable Diffusion and Flux inference, targeting developers and power users seeking a lightweight, dependency-free solution. It offers broad model support (SD1.x, SD2.x, SDXL, SD3/3.5, Flux), various quantization levels (2-bit to 8-bit integer), and extensive hardware acceleration options including CPU (AVX, AVX2, AVX512), CUDA, Metal, Vulkan, and SYCL.
How It Works
Built upon the ggml library, similar to llama.cpp, this project prioritizes efficiency and minimal dependencies. It supports loading models directly from common formats like .ckpt
, .safetensors
, and Hugging Face diffusers
, eliminating the need for conversion to .ggml
or .gguf
. Key features include Flash Attention for memory optimization, LoRA support, Latent Consistency Models (LCM), and ESRGAN upscaling.
Quick Start & Requirements
cmake
and cmake --build
.Highlighted Details
stable-diffusion-webui
.Maintenance & Community
The project actively lists contributors and references related projects like ggml
, stable-diffusion-webui
, and ComfyUI
.
Licensing & Compatibility
The project is licensed under the MIT License, allowing for commercial use and integration with closed-source applications.
Limitations & Caveats
The README notes that the current ggml_conv_2d
implementation is slow and memory-intensive, with ongoing efforts to optimize it. Metal backend currently has inefficiencies with very large matrices. Flash Attention may lower quality for some backends and can cause crashes if unsupported. Inpainting support is listed as a future TODO.
19 hours ago
1 day