Research paper implementation for multi-modal face generation/editing via collaborative diffusion
Top 70.3% on sourcepulse
This repository provides an implementation for Collaborative Diffusion, a method for multi-modal face generation and editing. It allows users to control the synthesis and modification of faces using inputs like text descriptions and segmentation masks, offering high-quality results with identity preservation. The target audience includes researchers and practitioners in computer vision and generative AI.
How It Works
Collaborative Diffusion leverages pre-trained uni-modal diffusion models. During the reverse diffusion process (from noise to image), it employs dynamic diffusers that predict spatial-varying and temporal-varying influence functions. These functions selectively modulate the contributions of different modalities at each step, enabling coherent multi-modal control for both generating new faces and editing existing ones.
Quick Start & Requirements
environment.yaml
, and activate it. Install dependencies with pip install transformers==4.19.2 scann kornia==0.6.4 torchmetrics==0.6.0 git+https://github.com/arogozhnikov/einops.git
.Highlighted Details
Maintenance & Community
The codebase is maintained by Ziqi Huang. It builds upon the LDM (Latent Diffusion Models) codebase and utilizes implementations from Imagic. Data sources include CelebA-HQ, CelebA-Dialog, CelebAMask-HQ, and MM-CelebA-HQ-Dataset.
Licensing & Compatibility
The repository does not explicitly state a license in the README. It is built on LDM, which is typically released under permissive licenses, but specific terms for this project are not detailed.
Limitations & Caveats
The README does not specify a license, which may impact commercial use or integration into closed-source projects. Producing intermediate results during inference can be memory-intensive, requiring careful configuration (e.g., batch_size=1
, reduced ddim_steps
).
1 year ago
1+ week