swift-diffusion  by liuliu

Single-file Stable Diffusion re-implementation for mobile deployment

created 2 years ago
456 stars

Top 67.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a single-file Swift re-implementation of the Stable Diffusion model, including CLIP, UNet, and decoder components, along with PLMS inference. It targets developers and researchers aiming to understand diffusion models or enable Stable Diffusion on Apple mobile devices, offering a path for highly optimized, on-device execution without relying on external runtimes like ONNX.

How It Works

The project meticulously re-implements Stable Diffusion's components in Swift, aiming for layer-by-layer output parity with the original PyTorch implementation. This approach facilitates deep understanding and enables custom, low-level optimizations crucial for resource-constrained mobile environments. The use of a custom framework (s4nnc) allows for fine-grained control over memory usage and kernel selection, potentially surpassing the capabilities of more general-purpose mobile ML frameworks.

Quick Start & Requirements

  • Install: Requires Bazel.
  • Dependencies: Swift compiler, CUDA (10.2+) and clang (Linux); Accelerate framework and pthreads (macOS). Model weights (sd-v1.4.ckpt) must be downloaded separately.
  • Setup: Linux requires installing Swift, CUDA, and various system libraries via apt. macOS requires modifying the WORKSPACE file and adding a .bazelrc.local configuration for MPS.
  • Run: bazel run examples:txt2img --compilation_mode=opt -- /path/to/sd-v1.4.ckpt "prompt"
  • Resources: GPU (2080 Ti tested), M1 Mac Mini (95s for FP16). Memory usage is a key focus, with efforts to reduce it from ~4GB to ~2GB via FP16 and potential INT8 quantization.
  • Docs: Bazel Install

Highlighted Details

  • Achieves comparable performance to PyTorch on GPU (15s vs 11s on 2080 Ti), with MPS on M1 Mac Mini taking ~95s.
  • Actively working on memory reduction techniques, including FP16 support (reducing UNet to ~1.9GB) and exploring INT8 quantization.
  • Aims for exact output replication with the original Stable Diffusion model given identical starting conditions.
  • Includes implementations for txt2img, img2img, and inpainting use cases.

Maintenance & Community

The project appears to be a personal educational effort by a single developer, with no explicit mention of community channels, roadmap, or other contributors in the README.

Licensing & Compatibility

The README does not specify a license. This is a critical omission for evaluating commercial use or integration into closed-source projects.

Limitations & Caveats

The setup process is complex, requiring Bazel and specific system dependencies. Inpainting functionality is noted as not working without prompt guidance. The lack of a specified license presents a significant adoption blocker.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 21 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Anil Dash Anil Dash(Former CEO of Glitch), and
8 more.

diffusionbee-stable-diffusion-ui by divamgupta

0.1%
13k
GUI app for local Stable Diffusion on MacOS
created 2 years ago
updated 9 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
9 more.

ml-stable-diffusion by apple

0.1%
18k
Core ML Stable Diffusion for Apple Silicon devices
created 2 years ago
updated 1 month ago
Feedback? Help us improve.