BK-SDM by Nota-NetsPresso

Compressed Stable Diffusion research paper for efficient text-to-image generation

Created 2 years ago

308 stars

Top 87.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

BK-SDM offers a compressed Stable Diffusion model for efficient text-to-image generation, targeting researchers and developers seeking faster, lighter, and more cost-effective diffusion models. It achieves this by removing specific residual and attention blocks from the U-Net architecture, enabling significant speedups and reduced resource requirements without substantial quality degradation.

How It Works

BK-SDM employs knowledge distillation to train a smaller, modified U-Net architecture derived from Stable Diffusion. By strategically removing certain residual and attention blocks and using a distilled training process with limited data, the model retains high-quality generation capabilities while drastically reducing computational overhead. This approach is advantageous for deployment on resource-constrained devices and for faster inference.

Quick Start & Requirements

Install: Clone the repository and install requirements: git clone https://github.com/Nota-NetsPresso/BK-SDM.git && cd BK-SDM && pip install -r requirements.txt
Prerequisites: Python 3.8, PyTorch (1.13.1 or 2.0.1+ recommended for specific tasks), CUDA.
Resources: Pretraining can require significant GPU memory (e.g., 28GB-53GB for single-GPU training with batch size 64-256). Inference is significantly faster, with 4-second generation times reported on an iPhone 14.
Links: Diffusers Integration, Hugging Face Models, Gradio Demo

Highlighted Details

Achieves 4-second inference on iPhone 14.
Compatible with SD v1 and v2 architectures.
Supports DreamBooth finetuning with LoRA for personalized generation.
Offers multiple model sizes (Base, Small, Tiny) and dataset variants (0.22M, 2.3M LAION pairs).
Core ML weights available for iOS/macOS applications.

Maintenance & Community

The project has been presented at ECCV 2024, ICCV 2023, and ICML 2023. Notable contributions include implementations by Segmind and KOALA. Community interaction can be found via Hugging Face Spaces and associated demos.

Licensing & Compatibility

The project is released under the CreativeML Open RAIL-M license. This license permits redistribution, commercial use, and use as a service, provided that the same use restrictions are included and a copy of the license is shared with users. It prohibits the deliberate production or sharing of illegal or harmful outputs.

Limitations & Caveats

While effective, the distillation pretraining process can be computationally intensive, requiring substantial GPU resources. The README notes that different batch sizes during evaluation can lead to slightly different generation scores due to variations in random latent code sampling.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days