distill-sd  by segmind

Diffusion model distillation for smaller, faster Stable Diffusion

created 2 years ago
607 stars

Top 54.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides knowledge-distilled, smaller versions of Stable Diffusion models, offering up to 50% reduction in size and faster inference. It's targeted at researchers and developers looking to optimize Stable Diffusion for resource-constrained environments or faster iteration cycles, enabling high-quality image generation with reduced computational overhead.

How It Works

The project implements knowledge distillation, where a smaller "student" U-Net model learns to mimic the outputs of a larger "teacher" U-Net (specifically SG161222/Realistic_Vision_V4.0). The training loss combines MSE between predicted noise and actual noise, and MSE between the final outputs and intermediate block outputs of the teacher and student models. This multi-level distillation approach aims to preserve image quality while significantly reducing model size and inference time.

Quick Start & Requirements

  • Install/Run: Use diffusers library for inference. Example Python snippet provided.
  • Prerequisites: PyTorch, diffusers library. GPU with CUDA recommended for inference. Training requires accelerate and potentially large datasets.
  • Links: Huggingface repo for small-sd, Huggingface repo for tiny-sd

Highlighted Details

  • Achieves up to 100% faster inference and up to 30% lower VRAM footprint compared to standard Stable Diffusion.
  • Offers "sd_small" (579M parameters) and "sd_tiny" (323M parameters) variants.
  • Training scripts for knowledge distillation, checkpoint fine-tuning, and LoRA training are included.
  • Pre-trained checkpoints for general use and fine-tuned on portrait images are available.

Maintenance & Community

  • The project is associated with Segmind.
  • Mentions research papers (BK-SDM, ICML Workshop) as the basis for the work.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. The underlying diffusers library is typically under the MIT license. Pre-trained models on Hugging Face may have their own licenses.

Limitations & Caveats

The distilled models are in an early phase and may not yet achieve production-quality general outputs. They are best suited for fine-tuning or LoRA training on specific concepts/styles and may struggle with composability or multi-concept generation. A note mentions a potential issue with config.json when resuming from checkpoints, requiring manual replacement.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Feedback? Help us improve.