Image generation research paper (CVPR 2025)
Top 67.1% on sourcepulse
This repository provides the official implementation for "Diffusion Self-Distillation for Zero-Shot Customized Image Generation," a method for fine-tuning text-to-image diffusion models for instance-specific generation tasks. It addresses the challenge of limited paired data for image-to-image tasks by using a pre-trained text-to-image model to generate its own training dataset, enabling zero-shot customization for artists and researchers.
How It Works
The core approach involves a two-stage process. First, a pre-trained text-to-image diffusion model is used to generate diverse image grids. A Visual-Language Model then curates these images into a paired dataset. Second, the original diffusion model is fine-tuned on this curated dataset to become a text-and-image-conditional model. This self-distillation process allows for instance-preserving generation without requiring test-time optimization, outperforming existing zero-shot methods.
Quick Start & Requirements
pip install -r requirements.txt
Highlighted Details
--model_offload
, --sequential_offload
) to reduce GPU memory requirements.Maintenance & Community
The project is under active development, with a CVPR 2025 publication. Updates are planned, including training code release and a relighting model. An unofficial ComfyUI integration is available.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The repository is still under construction, with the relighting model in alpha testing. Photorealistic face identity generation is not a primary focus, as dedicated models exist for this task. Training code is not yet released.
4 months ago
1 day