Collaborative-Diffusion by ziqihuangg

Research paper implementation for multi-modal face generation/editing via collaborative diffusion

Created 2 years ago

437 stars

Top 68.1% on SourcePulse

Project Summary

This repository provides an implementation for Collaborative Diffusion, a method for multi-modal face generation and editing. It allows users to control the synthesis and modification of faces using inputs like text descriptions and segmentation masks, offering high-quality results with identity preservation. The target audience includes researchers and practitioners in computer vision and generative AI.

How It Works

Collaborative Diffusion leverages pre-trained uni-modal diffusion models. During the reverse diffusion process (from noise to image), it employs dynamic diffusers that predict spatial-varying and temporal-varying influence functions. These functions selectively modulate the contributions of different modalities at each step, enabling coherent multi-modal control for both generating new faces and editing existing ones.

Quick Start & Requirements

Install: Clone the repository, create a conda environment using environment.yaml, and activate it. Install dependencies with pip install transformers==4.19.2 scann kornia==0.6.4 torchmetrics==0.6.0 git+https://github.com/arogozhnikov/einops.git.
Prerequisites: Python 3.x, PyTorch, CUDA (implied by LDM base), transformers, scann, kornia, torchmetrics, einops. Requires downloading pre-trained checkpoints (VAE, uni-modal diffusion models, collaborative diffusion models) and optionally preprocessed datasets for training.
Resources: Inference requires significant GPU memory, especially for intermediate outputs. Training requires multiple GPUs (e.g., 4x GPUs mentioned for training steps).
Links: Project Page (implied by CVPR 2023 mention).

Highlighted Details

Supports multi-modal face generation and editing using text and segmentation masks.
Offers single-modality generation (text-to-face, mask-to-face) as well.
Editing capabilities include text-based, mask-based, and collaborative edits using an adapted Imagic implementation.
Provides full training pipelines for VAE, uni-modal diffusion models, and dynamic diffusers.
Compatible with FreeU for enhanced results (as of Oct 2023 update).

Maintenance & Community

The codebase is maintained by Ziqi Huang. It builds upon the LDM (Latent Diffusion Models) codebase and utilizes implementations from Imagic. Data sources include CelebA-HQ, CelebA-Dialog, CelebAMask-HQ, and MM-CelebA-HQ-Dataset.

Licensing & Compatibility

The repository does not explicitly state a license in the README. It is built on LDM, which is typically released under permissive licenses, but specific terms for this project are not detailed.

Limitations & Caveats

The README does not specify a license, which may impact commercial use or integration into closed-source projects. Producing intermediate results during inference can be memory-intensive, requiring careful configuration (e.g., batch_size=1, reduced ddim_steps).

Collaborative-Diffusion by ziqihuangg

Explore Similar Projects

FreeDoM by yujiwen

oft by zqiu24

Awesome-Diffusion-for-Image-Translation by wd1511

SemanticStyleGAN by seasonSH

EasyPhoto by aigc-apps

Universal-Guided-Diffusion by arpitbansal297

XVerse by bytedance

glid-3-xl by Jack000

multimodal-garment-designer by aimagelab

TediGAN by IIGROUP

unidiffuser by thu-ml

latent-diffusion by CompVis