diffusion-self-distillation  by primecai

Image generation research paper (CVPR 2025)

Created 7 months ago
458 stars

Top 66.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Diffusion Self-Distillation for Zero-Shot Customized Image Generation," a method for fine-tuning text-to-image diffusion models for instance-specific generation tasks. It addresses the challenge of limited paired data for image-to-image tasks by using a pre-trained text-to-image model to generate its own training dataset, enabling zero-shot customization for artists and researchers.

How It Works

The core approach involves a two-stage process. First, a pre-trained text-to-image diffusion model is used to generate diverse image grids. A Visual-Language Model then curates these images into a paired dataset. Second, the original diffusion model is fine-tuned on this curated dataset to become a text-and-image-conditional model. This self-distillation process allows for instance-preserving generation without requiring test-time optimization, outperforming existing zero-shot methods.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, Git. Optional: Google Gemini API key for prompt enhancement. Requires downloading pretrained models.
  • Hardware: Recommended ~23.7GB VRAM; options for model offloading to CPU are available for lower VRAM usage.
  • Links: ComfyUI-DSD (unofficial integration)

Highlighted Details

  • Supports object, merchandise, logo, try-on, illustration, comic, manga, anime, and generic character generation.
  • Outperforms existing zero-shot methods and is competitive with per-instance tuning.
  • Achieves instance-preserving generation without test-time optimization.
  • Offers model offloading options (--model_offload, --sequential_offload) to reduce GPU memory requirements.

Maintenance & Community

The project is under active development, with a CVPR 2025 publication. Updates are planned, including training code release and a relighting model. An unofficial ComfyUI integration is available.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The repository is still under construction, with the relighting model in alpha testing. Photorealistic face identity generation is not a primary focus, as dedicated models exist for this task. Training code is not yet released.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.3%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.