swift-diffusion by liuliu

Single-file Stable Diffusion re-implementation for mobile deployment

Created 3 years ago

457 stars

Top 66.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Sindre Sorhus

Prolific OSS Developer

Project Summary

This repository provides a single-file Swift re-implementation of the Stable Diffusion model, including CLIP, UNet, and decoder components, along with PLMS inference. It targets developers and researchers aiming to understand diffusion models or enable Stable Diffusion on Apple mobile devices, offering a path for highly optimized, on-device execution without relying on external runtimes like ONNX.

How It Works

The project meticulously re-implements Stable Diffusion's components in Swift, aiming for layer-by-layer output parity with the original PyTorch implementation. This approach facilitates deep understanding and enables custom, low-level optimizations crucial for resource-constrained mobile environments. The use of a custom framework (s4nnc) allows for fine-grained control over memory usage and kernel selection, potentially surpassing the capabilities of more general-purpose mobile ML frameworks.

Quick Start & Requirements

Install: Requires Bazel.
Dependencies: Swift compiler, CUDA (10.2+) and clang (Linux); Accelerate framework and pthreads (macOS). Model weights (sd-v1.4.ckpt) must be downloaded separately.
Setup: Linux requires installing Swift, CUDA, and various system libraries via apt. macOS requires modifying the WORKSPACE file and adding a .bazelrc.local configuration for MPS.
Run: bazel run examples:txt2img --compilation_mode=opt -- /path/to/sd-v1.4.ckpt "prompt"
Resources: GPU (2080 Ti tested), M1 Mac Mini (95s for FP16). Memory usage is a key focus, with efforts to reduce it from ~4GB to ~2GB via FP16 and potential INT8 quantization.
Docs: Bazel Install

Highlighted Details

Achieves comparable performance to PyTorch on GPU (15s vs 11s on 2080 Ti), with MPS on M1 Mac Mini taking ~95s.
Actively working on memory reduction techniques, including FP16 support (reducing UNet to ~1.9GB) and exploring INT8 quantization.
Aims for exact output replication with the original Stable Diffusion model given identical starting conditions.
Includes implementations for txt2img, img2img, and inpainting use cases.

Maintenance & Community

The project appears to be a personal educational effort by a single developer, with no explicit mention of community channels, roadmap, or other contributors in the README.

Licensing & Compatibility

The README does not specify a license. This is a critical omission for evaluating commercial use or integration into closed-source projects.

Limitations & Caveats

The setup process is complex, requiring Bazel and specific system dependencies. Inpainting functionality is noted as not working without prompt guidance. The lack of a specified license presents a significant adoption blocker.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days