diffusion-self-distillation by primecai

Image generation research paper (CVPR 2025)

Created 11 months ago

459 stars

Top 65.9% on SourcePulse

Project Summary

This repository provides the official implementation for "Diffusion Self-Distillation for Zero-Shot Customized Image Generation," a method for fine-tuning text-to-image diffusion models for instance-specific generation tasks. It addresses the challenge of limited paired data for image-to-image tasks by using a pre-trained text-to-image model to generate its own training dataset, enabling zero-shot customization for artists and researchers.

How It Works

The core approach involves a two-stage process. First, a pre-trained text-to-image diffusion model is used to generate diverse image grids. A Visual-Language Model then curates these images into a paired dataset. Second, the original diffusion model is fine-tuned on this curated dataset to become a text-and-image-conditional model. This self-distillation process allows for instance-preserving generation without requiring test-time optimization, outperforming existing zero-shot methods.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python, Git. Optional: Google Gemini API key for prompt enhancement. Requires downloading pretrained models.
Hardware: Recommended ~23.7GB VRAM; options for model offloading to CPU are available for lower VRAM usage.
Links: ComfyUI-DSD (unofficial integration)

Highlighted Details

Supports object, merchandise, logo, try-on, illustration, comic, manga, anime, and generic character generation.
Outperforms existing zero-shot methods and is competitive with per-instance tuning.
Achieves instance-preserving generation without test-time optimization.
Offers model offloading options (--model_offload, --sequential_offload) to reduce GPU memory requirements.

Maintenance & Community

The project is under active development, with a CVPR 2025 publication. Updates are planned, including training code release and a relighting model. An unofficial ComfyUI integration is available.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The repository is still under construction, with the relighting model in alpha testing. Photorealistic face identity generation is not a primary focus, as dedicated models exist for this task. Training code is not yet released.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days