Collaborative-Diffusion  by ziqihuangg

Research paper implementation for multi-modal face generation/editing via collaborative diffusion

created 2 years ago
428 stars

Top 70.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation for Collaborative Diffusion, a method for multi-modal face generation and editing. It allows users to control the synthesis and modification of faces using inputs like text descriptions and segmentation masks, offering high-quality results with identity preservation. The target audience includes researchers and practitioners in computer vision and generative AI.

How It Works

Collaborative Diffusion leverages pre-trained uni-modal diffusion models. During the reverse diffusion process (from noise to image), it employs dynamic diffusers that predict spatial-varying and temporal-varying influence functions. These functions selectively modulate the contributions of different modalities at each step, enabling coherent multi-modal control for both generating new faces and editing existing ones.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment using environment.yaml, and activate it. Install dependencies with pip install transformers==4.19.2 scann kornia==0.6.4 torchmetrics==0.6.0 git+https://github.com/arogozhnikov/einops.git.
  • Prerequisites: Python 3.x, PyTorch, CUDA (implied by LDM base), transformers, scann, kornia, torchmetrics, einops. Requires downloading pre-trained checkpoints (VAE, uni-modal diffusion models, collaborative diffusion models) and optionally preprocessed datasets for training.
  • Resources: Inference requires significant GPU memory, especially for intermediate outputs. Training requires multiple GPUs (e.g., 4x GPUs mentioned for training steps).
  • Links: Project Page (implied by CVPR 2023 mention).

Highlighted Details

  • Supports multi-modal face generation and editing using text and segmentation masks.
  • Offers single-modality generation (text-to-face, mask-to-face) as well.
  • Editing capabilities include text-based, mask-based, and collaborative edits using an adapted Imagic implementation.
  • Provides full training pipelines for VAE, uni-modal diffusion models, and dynamic diffusers.
  • Compatible with FreeU for enhanced results (as of Oct 2023 update).

Maintenance & Community

The codebase is maintained by Ziqi Huang. It builds upon the LDM (Latent Diffusion Models) codebase and utilizes implementations from Imagic. Data sources include CelebA-HQ, CelebA-Dialog, CelebAMask-HQ, and MM-CelebA-HQ-Dataset.

Licensing & Compatibility

The repository does not explicitly state a license in the README. It is built on LDM, which is typically released under permissive licenses, but specific terms for this project are not detailed.

Limitations & Caveats

The README does not specify a license, which may impact commercial use or integration into closed-source projects. Producing intermediate results during inference can be memory-intensive, requiring careful configuration (e.g., batch_size=1, reduced ddim_steps).

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.