ComfyUI_Sonic  by smthemex

ComfyUI nodes for audio-driven portrait animation, based on a research paper

created 5 months ago
1,062 stars

Top 36.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the ComfyUI custom node "ComfyUI_Sonic" for audio-driven portrait animation, enabling users to generate animated faces synchronized with audio input. It leverages global audio perception for more realistic and nuanced facial movements, targeting users of ComfyUI interested in AI-powered animation and content creation.

How It Works

Sonic integrates audio processing with Stable Diffusion Video (SVD) models to drive facial animation. It processes audio to extract features that influence the animation, aiming for a more holistic audio-to-visual mapping than frame-by-frame lip-sync. The approach uses specific models like whisper-tiny for audio feature extraction and RIFE for frame interpolation, combined with SVD for video generation.

Quick Start & Requirements

  • Installation: Clone the repository into ./ComfyUI/custom_node.
  • Requirements: Run pip install -r requirements.txt.
  • Models: Download checkpoints (audio2bucket.pth, audio2token.pth, unet.pth, yoloface_v5m.pt, whisper-tiny/, RIFE/flownet.pkl) and SVD checkpoints (svd_xt.safetensors or svd_xt_1_1.safetensors).
  • Hardware: Supports CUDA, MPS (Mac), and addresses potential OOM issues with 12GB VRAM by suggesting reduced image_size.

Highlighted Details

  • Supports non-square image output.
  • Addresses batch mismatch for non-25fps frame rates.
  • Allows control of output duration via a duration parameter, replacing the old 'frame number' option.
  • Fixes BF16 errors and MPS device errors.

Maintenance & Community

The project acknowledges a PR from @civen-cn. Further community or maintenance details are not provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a method for portrait animation and may be experimental. Users might encounter OOM errors, especially with larger image_size settings, and are advised to reduce this value. The accuracy of duration control is noted as not being a precise percentage.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
150 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.