diffusion-self-distillation  by primecai

Image generation research paper (CVPR 2025)

created 5 months ago
457 stars

Top 67.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Diffusion Self-Distillation for Zero-Shot Customized Image Generation," a method for fine-tuning text-to-image diffusion models for instance-specific generation tasks. It addresses the challenge of limited paired data for image-to-image tasks by using a pre-trained text-to-image model to generate its own training dataset, enabling zero-shot customization for artists and researchers.

How It Works

The core approach involves a two-stage process. First, a pre-trained text-to-image diffusion model is used to generate diverse image grids. A Visual-Language Model then curates these images into a paired dataset. Second, the original diffusion model is fine-tuned on this curated dataset to become a text-and-image-conditional model. This self-distillation process allows for instance-preserving generation without requiring test-time optimization, outperforming existing zero-shot methods.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, Git. Optional: Google Gemini API key for prompt enhancement. Requires downloading pretrained models.
  • Hardware: Recommended ~23.7GB VRAM; options for model offloading to CPU are available for lower VRAM usage.
  • Links: ComfyUI-DSD (unofficial integration)

Highlighted Details

  • Supports object, merchandise, logo, try-on, illustration, comic, manga, anime, and generic character generation.
  • Outperforms existing zero-shot methods and is competitive with per-instance tuning.
  • Achieves instance-preserving generation without test-time optimization.
  • Offers model offloading options (--model_offload, --sequential_offload) to reduce GPU memory requirements.

Maintenance & Community

The project is under active development, with a CVPR 2025 publication. Updates are planned, including training code release and a relighting model. An unofficial ComfyUI integration is available.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The repository is still under construction, with the relighting model in alpha testing. Photorealistic face identity generation is not a primary focus, as dedicated models exist for this task. Training code is not yet released.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
29 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.