diffusion-self-distillation  by primecai

Image generation research paper (CVPR 2025)

Created 1 year ago
461 stars

Top 65.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Diffusion Self-Distillation for Zero-Shot Customized Image Generation," a method for fine-tuning text-to-image diffusion models for instance-specific generation tasks. It addresses the challenge of limited paired data for image-to-image tasks by using a pre-trained text-to-image model to generate its own training dataset, enabling zero-shot customization for artists and researchers.

How It Works

The core approach involves a two-stage process. First, a pre-trained text-to-image diffusion model is used to generate diverse image grids. A Visual-Language Model then curates these images into a paired dataset. Second, the original diffusion model is fine-tuned on this curated dataset to become a text-and-image-conditional model. This self-distillation process allows for instance-preserving generation without requiring test-time optimization, outperforming existing zero-shot methods.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python, Git. Optional: Google Gemini API key for prompt enhancement. Requires downloading pretrained models.
  • Hardware: Recommended ~23.7GB VRAM; options for model offloading to CPU are available for lower VRAM usage.
  • Links: ComfyUI-DSD (unofficial integration)

Highlighted Details

  • Supports object, merchandise, logo, try-on, illustration, comic, manga, anime, and generic character generation.
  • Outperforms existing zero-shot methods and is competitive with per-instance tuning.
  • Achieves instance-preserving generation without test-time optimization.
  • Offers model offloading options (--model_offload, --sequential_offload) to reduce GPU memory requirements.

Maintenance & Community

The project is under active development, with a CVPR 2025 publication. Updates are planned, including training code release and a relighting model. An unofficial ComfyUI integration is available.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The repository is still under construction, with the relighting model in alpha testing. Photorealistic face identity generation is not a primary focus, as dedicated models exist for this task. Training code is not yet released.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

RPG-DiffusionMaster by YangLing0818

0%
2k
Training-free paradigm for text-to-image generation/editing
Created 2 years ago
Updated 1 year ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.1%
7k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.