stable-diffusion.cpp  by leejet

C/C++ inference for Stable Diffusion models

Created 2 years ago
4,395 stars

Top 11.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a pure C/C++ implementation for Stable Diffusion and Flux inference, targeting developers and power users seeking a lightweight, dependency-free solution. It offers broad model support (SD1.x, SD2.x, SDXL, SD3/3.5, Flux), various quantization levels (2-bit to 8-bit integer), and extensive hardware acceleration options including CPU (AVX, AVX2, AVX512), CUDA, Metal, Vulkan, and SYCL.

How It Works

Built upon the ggml library, similar to llama.cpp, this project prioritizes efficiency and minimal dependencies. It supports loading models directly from common formats like .ckpt, .safetensors, and Hugging Face diffusers, eliminating the need for conversion to .ggml or .gguf. Key features include Flash Attention for memory optimization, LoRA support, Latent Consistency Models (LCM), and ESRGAN upscaling.

Quick Start & Requirements

  • Install/Run: Download pre-built executables from releases or build from source using CMake.
  • Prerequisites: CUDA Toolkit (for CUDA acceleration), ROCm toolkit (for HipBLAS), Vulkan SDK, Intel® oneAPI Base toolkit (for SYCL). Recommended VRAM: 4GB+.
  • Setup: Building from source involves cmake and cmake --build.
  • Docs: https://github.com/leejet/stable-diffusion.cpp

Highlighted Details

  • Supports SDXL VAE FP16 NaN fix.
  • Offers 2-bit, 3-bit, 4-bit, 5-bit, and 8-bit integer quantization.
  • Enables GPU acceleration via CUDA, Metal, Vulkan, and SYCL.
  • Integrates ControlNet, PhotoMaker, and TAESD for faster decoding.
  • Embeds generation parameters into PNG output compatible with stable-diffusion-webui.

Maintenance & Community

The project actively lists contributors and references related projects like ggml, stable-diffusion-webui, and ComfyUI.

Licensing & Compatibility

The project is licensed under the MIT License, allowing for commercial use and integration with closed-source applications.

Limitations & Caveats

The README notes that the current ggml_conv_2d implementation is slow and memory-intensive, with ongoing efforts to optimize it. Metal backend currently has inefficiencies with very large matrices. Flash Attention may lower quality for some backends and can cause crashes if unsupported. Inpainting support is listed as a future TODO.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
59
Issues (30d)
31
Star History
84 stars in the last 30 days

Explore Similar Projects

Starred by Chaoyu Yang Chaoyu Yang(Founder of Bento), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

nunchaku by nunchaku-tech

1.9%
3k
High-performance 4-bit diffusion model inference engine
Created 10 months ago
Updated 2 days ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.1%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.2%
6k
PyTorch implementation for Google's Gemma models
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.