swift-diffusion  by liuliu

Single-file Stable Diffusion re-implementation for mobile deployment

Created 3 years ago
458 stars

Top 66.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a single-file Swift re-implementation of the Stable Diffusion model, including CLIP, UNet, and decoder components, along with PLMS inference. It targets developers and researchers aiming to understand diffusion models or enable Stable Diffusion on Apple mobile devices, offering a path for highly optimized, on-device execution without relying on external runtimes like ONNX.

How It Works

The project meticulously re-implements Stable Diffusion's components in Swift, aiming for layer-by-layer output parity with the original PyTorch implementation. This approach facilitates deep understanding and enables custom, low-level optimizations crucial for resource-constrained mobile environments. The use of a custom framework (s4nnc) allows for fine-grained control over memory usage and kernel selection, potentially surpassing the capabilities of more general-purpose mobile ML frameworks.

Quick Start & Requirements

  • Install: Requires Bazel.
  • Dependencies: Swift compiler, CUDA (10.2+) and clang (Linux); Accelerate framework and pthreads (macOS). Model weights (sd-v1.4.ckpt) must be downloaded separately.
  • Setup: Linux requires installing Swift, CUDA, and various system libraries via apt. macOS requires modifying the WORKSPACE file and adding a .bazelrc.local configuration for MPS.
  • Run: bazel run examples:txt2img --compilation_mode=opt -- /path/to/sd-v1.4.ckpt "prompt"
  • Resources: GPU (2080 Ti tested), M1 Mac Mini (95s for FP16). Memory usage is a key focus, with efforts to reduce it from ~4GB to ~2GB via FP16 and potential INT8 quantization.
  • Docs: Bazel Install

Highlighted Details

  • Achieves comparable performance to PyTorch on GPU (15s vs 11s on 2080 Ti), with MPS on M1 Mac Mini taking ~95s.
  • Actively working on memory reduction techniques, including FP16 support (reducing UNet to ~1.9GB) and exploring INT8 quantization.
  • Aims for exact output replication with the original Stable Diffusion model given identical starting conditions.
  • Includes implementations for txt2img, img2img, and inpainting use cases.

Maintenance & Community

The project appears to be a personal educational effort by a single developer, with no explicit mention of community channels, roadmap, or other contributors in the README.

Licensing & Compatibility

The README does not specify a license. This is a critical omission for evaluating commercial use or integration into closed-source projects.

Limitations & Caveats

The setup process is complex, requiring Bazel and specific system dependencies. Inpainting functionality is noted as not working without prompt guidance. The lack of a specified license presents a significant adoption blocker.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral) and Jiaming Song Jiaming Song(Chief Scientist at Luma AI).

tomesd by dbolya

0.3%
1k
Speed-up tool for Stable Diffusion
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Feedback? Help us improve.