FreeDoM by yujiwen

ICCV 2023 paper implementing training-free conditional diffusion

Created 2 years ago

306 stars

Top 87.7% on SourcePulse

Project Summary

FreeDoM is a training-free method for controlling unconditional diffusion models using various conditions like text, sketches, and face IDs. It enables conditional generation across different domains, including human faces and ImageNet, by leveraging pre-trained networks to guide the diffusion process, offering a flexible approach to controlled image synthesis.

How It Works

FreeDoM constructs a time-independent energy function using off-the-shelf pre-trained networks. This function quantifies the discrepancy between intermediate generated images and desired conditions. By computing the gradient of this energy function, FreeDoM guides the diffusion sampling process, allowing for condition-specific generation without requiring model fine-tuning.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: PyTorch, CUDA (tested on RTX 3090), and specific pre-trained models for guidance (CLIP, face parsing, sketch, landmark, ArcFace).
Resources: Sampling times range from ~20s to ~140s per image on an RTX 3090, depending on the model and conditions.
Links: Paper, Supplementary

Highlighted Details

Supports diverse conditions: text, segmentation maps, sketches, landmarks, face IDs, and style images.
Applicable to various domains: human faces, ImageNet, and latent codes.
Integrates with SDEdit, guided-diffusion, Stable Diffusion, and ControlNet.
Achieves conditional generation with sampling times comparable to existing methods.

Maintenance & Community

The project is the official implementation for an ICCV 2023 paper. The README indicates ongoing development with completed tasks including code release for human face models, ControlNet integration, and Stable Diffusion style guidance. No specific community links (Discord, Slack) are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it heavily relies on and acknowledges other open-source projects like SDEdit, guided-diffusion, Stable Diffusion, and ControlNet, which have their own licenses. Compatibility for commercial use would require verifying the licenses of all dependencies.

Limitations & Caveats

The README does not detail specific limitations or known bugs. The sampling times are reported on a single GPU (RTX 3090), and performance on different hardware may vary. The project is presented as an official implementation of a research paper, suggesting it may primarily focus on research use cases.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days