ComfyUI-DeepFuze by SamKhoze

ComfyUI nodes for deep learning-based face manipulation

Created 1 year ago

449 stars

Top 66.9% on SourcePulse

Project Summary

DeepFuze is a ComfyUI custom node suite for advanced facial transformation, including lipsyncing, face swapping, and voice cloning. It targets content creators, animators, and developers seeking to enhance video projects with AI-driven realism and synchronization, offering a powerful offline solution.

How It Works

DeepFuze integrates advanced deep learning models within the ComfyUI node-based workflow. It leverages specialized nodes for tasks like lipsync generation, face swapping, and voice cloning, allowing users to combine audio and video with precise facial movement synchronization. The architecture supports various face detection models (YOLOFace, RetinaFace, SCRFD, YuNet) and includes optional enhancers for improved output quality.

Quick Start & Requirements

Installation: Install via ComfyUI Manager by searching for "DeepFuze" or by cloning the Git URL: https://github.com/SamKhoze/ComfyUI-DeepFuze.git.
Prerequisites:
- Windows: Visual Studio (Community or Build Tools with "Desktop Development with C++"). CUDA Toolkit 11.8 and cuDNN 8.9.2.26 are recommended for significant speed increases but are not beginner-friendly to install.
- macOS: PyTorch with MPS support. export PYTORCH_ENABLE_MPS_FALLBACK=1 environment variable. onnxruntime (CPU version), dlib, and TTS must be installed.
- ComfyUI-VideoHelperSuite is required for audio/video loading.
Setup: Manual CUDA installation on Windows can be time-consuming and complex. macOS installation requires specific environment variable setup and package installations.
Links: ComfyUI-VideoHelperSuite

Highlighted Details

Supports 17 languages for voice cloning.
Offers an OpenAI LLM integration node for dialogue generation.
Includes a face detector model comparison table (YOLOFace, RetinaFace, SCRFD, YuNet) detailing speed, accuracy, and use cases.
Provides programmatic Python API for direct integration.

Maintenance & Community

The project is developed by Dr. Sam Khoze and his team. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The code is released under an unspecified open-source license. It is stated to be free for personal, research, academic, and commercial use, with a caution to comply with applicable laws and use responsibly.

Limitations & Caveats

CUDA installation on Windows is noted as non-beginner-friendly. macOS installation requires manual steps outside the ComfyUI Manager. The TTS node may have sample rate issues with certain audio formats, with a converter node in development. The OpenAI LLM node requires manual API key entry on each use unless set as an environment variable.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days