distill-sd by segmind

Diffusion model distillation for smaller, faster Stable Diffusion

Created 2 years ago

619 stars

Top 53.4% on SourcePulse

View on GitHub

3 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Gabriel Almeida

Cofounder of Langflow

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

This repository provides knowledge-distilled, smaller versions of Stable Diffusion models, offering up to 50% reduction in size and faster inference. It's targeted at researchers and developers looking to optimize Stable Diffusion for resource-constrained environments or faster iteration cycles, enabling high-quality image generation with reduced computational overhead.

How It Works

The project implements knowledge distillation, where a smaller "student" U-Net model learns to mimic the outputs of a larger "teacher" U-Net (specifically SG161222/Realistic_Vision_V4.0). The training loss combines MSE between predicted noise and actual noise, and MSE between the final outputs and intermediate block outputs of the teacher and student models. This multi-level distillation approach aims to preserve image quality while significantly reducing model size and inference time.

Quick Start & Requirements

Install/Run: Use diffusers library for inference. Example Python snippet provided.
Prerequisites: PyTorch, diffusers library. GPU with CUDA recommended for inference. Training requires accelerate and potentially large datasets.
Links: Huggingface repo for small-sd, Huggingface repo for tiny-sd

Highlighted Details

Achieves up to 100% faster inference and up to 30% lower VRAM footprint compared to standard Stable Diffusion.
Offers "sd_small" (579M parameters) and "sd_tiny" (323M parameters) variants.
Training scripts for knowledge distillation, checkpoint fine-tuning, and LoRA training are included.
Pre-trained checkpoints for general use and fine-tuned on portrait images are available.

Maintenance & Community

The project is associated with Segmind.
Mentions research papers (BK-SDM, ICML Workshop) as the basis for the work.

Licensing & Compatibility

The repository itself does not explicitly state a license. The underlying diffusers library is typically under the MIT license. Pre-trained models on Hugging Face may have their own licenses.

Limitations & Caveats

The distilled models are in an early phase and may not yet achieve production-quality general outputs. They are best suited for fine-tuning or LoRA training on specific concepts/styles and may struggle with composability or multi-concept generation. A note mentions a potential issue with config.json when resuming from checkpoints, requiring manual replacement.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days