Collaborative-Diffusion  by ziqihuangg

Research paper implementation for multi-modal face generation/editing via collaborative diffusion

Created 2 years ago
430 stars

Top 69.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides an implementation for Collaborative Diffusion, a method for multi-modal face generation and editing. It allows users to control the synthesis and modification of faces using inputs like text descriptions and segmentation masks, offering high-quality results with identity preservation. The target audience includes researchers and practitioners in computer vision and generative AI.

How It Works

Collaborative Diffusion leverages pre-trained uni-modal diffusion models. During the reverse diffusion process (from noise to image), it employs dynamic diffusers that predict spatial-varying and temporal-varying influence functions. These functions selectively modulate the contributions of different modalities at each step, enabling coherent multi-modal control for both generating new faces and editing existing ones.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment using environment.yaml, and activate it. Install dependencies with pip install transformers==4.19.2 scann kornia==0.6.4 torchmetrics==0.6.0 git+https://github.com/arogozhnikov/einops.git.
  • Prerequisites: Python 3.x, PyTorch, CUDA (implied by LDM base), transformers, scann, kornia, torchmetrics, einops. Requires downloading pre-trained checkpoints (VAE, uni-modal diffusion models, collaborative diffusion models) and optionally preprocessed datasets for training.
  • Resources: Inference requires significant GPU memory, especially for intermediate outputs. Training requires multiple GPUs (e.g., 4x GPUs mentioned for training steps).
  • Links: Project Page (implied by CVPR 2023 mention).

Highlighted Details

  • Supports multi-modal face generation and editing using text and segmentation masks.
  • Offers single-modality generation (text-to-face, mask-to-face) as well.
  • Editing capabilities include text-based, mask-based, and collaborative edits using an adapted Imagic implementation.
  • Provides full training pipelines for VAE, uni-modal diffusion models, and dynamic diffusers.
  • Compatible with FreeU for enhanced results (as of Oct 2023 update).

Maintenance & Community

The codebase is maintained by Ziqi Huang. It builds upon the LDM (Latent Diffusion Models) codebase and utilizes implementations from Imagic. Data sources include CelebA-HQ, CelebA-Dialog, CelebAMask-HQ, and MM-CelebA-HQ-Dataset.

Licensing & Compatibility

The repository does not explicitly state a license in the README. It is built on LDM, which is typically released under permissive licenses, but specific terms for this project are not detailed.

Limitations & Caveats

The README does not specify a license, which may impact commercial use or integration into closed-source projects. Producing intermediate results during inference can be memory-intensive, requiring careful configuration (e.g., batch_size=1, reduced ddim_steps).

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI) and Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI).

oft by zqiu24

0.3%
294
Research paper on orthogonal finetuning for text-to-image diffusion models
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.