This repository provides tools and Swift packages for running Stable Diffusion image generation models efficiently on Apple Silicon hardware using Core ML. It targets developers building macOS, iOS, and iPadOS applications who want to integrate advanced image generation capabilities, offering optimized performance and reduced memory footprint through Core ML's capabilities.
How It Works
The project leverages coremltools
to convert PyTorch Stable Diffusion models into the Core ML format. This conversion process includes optimizations like weight quantization (down to 6-bit and even lower with Mixed-Bit Palettization) and attention mechanism implementations (SPLIT_EINSUM
, SPLIT_EINSUM_V2
) tailored for Apple's Neural Engine and GPUs. The resulting .mlpackage
or .mlmodelc
files can then be directly integrated into Xcode projects via a Swift package for on-device inference.
Quick Start & Requirements
- Model Conversion: Requires macOS 13.1+, Python 3.8, and
coremltools
7.0.
- Project Build: Requires macOS 13.1+, Xcode 14.3+, and Swift 5.8.
- Target Devices: macOS 13.1+, iPadOS/iOS 16.2+. For memory improvements, macOS 14.0+ / iPadOS/iOS 17.0+.
- Hardware: Minimum M1 (Mac), M1 (iPad), A14 (iPhone).
- Installation: Clone the repository, set up a Python environment (
conda activate coreml_stable_diffusion
, pip install -e .
), and log in to Hugging Face CLI. Model conversion is done via python -m python_coreml_stable_diffusion.torch2coreml ...
.
- Swift Integration: Add the
StableDiffusion
Swift package to Xcode projects.
- Resources: Core ML Tools Docs, WWDC23 Session.
Highlighted Details
- Supports various Stable Diffusion versions (v1.4, v1.5, v2.1, XL, v3) with optimized Core ML models available on Hugging Face Hub.
- Advanced compression techniques like Mixed-Bit Palettization (MBP) and Activation Quantization (W8A8) significantly reduce model size and improve inference speed on Apple Silicon.
- Includes support for ControlNet and multilingual text encoders using Apple's
NaturalLanguage
framework.
- Provides detailed performance benchmarks across different Apple devices, demonstrating latency and diffusion speed improvements with various optimization techniques.
Maintenance & Community
- Developed by Apple, with contributions from Hugging Face and the community.
- Active development indicated by support for newer models like Stable Diffusion 3.
- Hugging Face Diffusers App serves as a demo and reference implementation.
Licensing & Compatibility
- The repository itself is licensed under the Apache License 2.0.
- Core ML models are typically distributed under licenses from their original creators (e.g., Stability AI, CompVis), which should be reviewed for commercial use.
Limitations & Caveats
- Model conversion can be memory-intensive, potentially requiring workarounds on systems with limited RAM (e.g., 8GB).
- While optimizations aim for high fidelity, minor differences in generated images compared to PyTorch are possible due to floating-point precision and RNG variations.
- Some advanced features like ControlNet for SDXL are not yet supported.