Compressed Stable Diffusion research paper for efficient text-to-image generation
Top 91.2% on sourcepulse
BK-SDM offers a compressed Stable Diffusion model for efficient text-to-image generation, targeting researchers and developers seeking faster, lighter, and more cost-effective diffusion models. It achieves this by removing specific residual and attention blocks from the U-Net architecture, enabling significant speedups and reduced resource requirements without substantial quality degradation.
How It Works
BK-SDM employs knowledge distillation to train a smaller, modified U-Net architecture derived from Stable Diffusion. By strategically removing certain residual and attention blocks and using a distilled training process with limited data, the model retains high-quality generation capabilities while drastically reducing computational overhead. This approach is advantageous for deployment on resource-constrained devices and for faster inference.
Quick Start & Requirements
git clone https://github.com/Nota-NetsPresso/BK-SDM.git && cd BK-SDM && pip install -r requirements.txt
Highlighted Details
Maintenance & Community
The project has been presented at ECCV 2024, ICCV 2023, and ICML 2023. Notable contributions include implementations by Segmind and KOALA. Community interaction can be found via Hugging Face Spaces and associated demos.
Licensing & Compatibility
The project is released under the CreativeML Open RAIL-M license. This license permits redistribution, commercial use, and use as a service, provided that the same use restrictions are included and a copy of the license is shared with users. It prohibits the deliberate production or sharing of illegal or harmful outputs.
Limitations & Caveats
While effective, the distillation pretraining process can be computationally intensive, requiring substantial GPU resources. The README notes that different batch sizes during evaluation can lead to slightly different generation scores due to variations in random latent code sampling.
1 year ago
1 day