Diffusion model distillation for smaller, faster Stable Diffusion
Top 54.7% on sourcepulse
This repository provides knowledge-distilled, smaller versions of Stable Diffusion models, offering up to 50% reduction in size and faster inference. It's targeted at researchers and developers looking to optimize Stable Diffusion for resource-constrained environments or faster iteration cycles, enabling high-quality image generation with reduced computational overhead.
How It Works
The project implements knowledge distillation, where a smaller "student" U-Net model learns to mimic the outputs of a larger "teacher" U-Net (specifically SG161222/Realistic_Vision_V4.0). The training loss combines MSE between predicted noise and actual noise, and MSE between the final outputs and intermediate block outputs of the teacher and student models. This multi-level distillation approach aims to preserve image quality while significantly reducing model size and inference time.
Quick Start & Requirements
diffusers
library for inference. Example Python snippet provided.diffusers
library. GPU with CUDA recommended for inference. Training requires accelerate
and potentially large datasets.Highlighted Details
Maintenance & Community
Licensing & Compatibility
diffusers
library is typically under the MIT license. Pre-trained models on Hugging Face may have their own licenses.Limitations & Caveats
The distilled models are in an early phase and may not yet achieve production-quality general outputs. They are best suited for fine-tuning or LoRA training on specific concepts/styles and may struggle with composability or multi-concept generation. A note mentions a potential issue with config.json
when resuming from checkpoints, requiring manual replacement.
1 year ago
1 week