FreeDoM  by yujiwen

ICCV 2023 paper implementing training-free conditional diffusion

created 2 years ago
302 stars

Top 89.3% on sourcepulse

GitHubView on GitHub
Project Summary

FreeDoM is a training-free method for controlling unconditional diffusion models using various conditions like text, sketches, and face IDs. It enables conditional generation across different domains, including human faces and ImageNet, by leveraging pre-trained networks to guide the diffusion process, offering a flexible approach to controlled image synthesis.

How It Works

FreeDoM constructs a time-independent energy function using off-the-shelf pre-trained networks. This function quantifies the discrepancy between intermediate generated images and desired conditions. By computing the gradient of this energy function, FreeDoM guides the diffusion sampling process, allowing for condition-specific generation without requiring model fine-tuning.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: PyTorch, CUDA (tested on RTX 3090), and specific pre-trained models for guidance (CLIP, face parsing, sketch, landmark, ArcFace).
  • Resources: Sampling times range from ~20s to ~140s per image on an RTX 3090, depending on the model and conditions.
  • Links: Paper, Supplementary

Highlighted Details

  • Supports diverse conditions: text, segmentation maps, sketches, landmarks, face IDs, and style images.
  • Applicable to various domains: human faces, ImageNet, and latent codes.
  • Integrates with SDEdit, guided-diffusion, Stable Diffusion, and ControlNet.
  • Achieves conditional generation with sampling times comparable to existing methods.

Maintenance & Community

The project is the official implementation for an ICCV 2023 paper. The README indicates ongoing development with completed tasks including code release for human face models, ControlNet integration, and Stable Diffusion style guidance. No specific community links (Discord, Slack) are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it heavily relies on and acknowledges other open-source projects like SDEdit, guided-diffusion, Stable Diffusion, and ControlNet, which have their own licenses. Compatibility for commercial use would require verifying the licenses of all dependencies.

Limitations & Caveats

The README does not detail specific limitations or known bugs. The sampling times are reported on a single GPU (RTX 3090), and performance on different hardware may vary. The project is presented as an official implementation of a research paper, suggesting it may primarily focus on research use cases.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Feedback? Help us improve.