ComfyUI nodes for audio-driven portrait animation, based on a research paper
Top 36.3% on sourcepulse
This repository provides the ComfyUI custom node "ComfyUI_Sonic" for audio-driven portrait animation, enabling users to generate animated faces synchronized with audio input. It leverages global audio perception for more realistic and nuanced facial movements, targeting users of ComfyUI interested in AI-powered animation and content creation.
How It Works
Sonic integrates audio processing with Stable Diffusion Video (SVD) models to drive facial animation. It processes audio to extract features that influence the animation, aiming for a more holistic audio-to-visual mapping than frame-by-frame lip-sync. The approach uses specific models like whisper-tiny
for audio feature extraction and RIFE for frame interpolation, combined with SVD for video generation.
Quick Start & Requirements
./ComfyUI/custom_node
.pip install -r requirements.txt
.audio2bucket.pth
, audio2token.pth
, unet.pth
, yoloface_v5m.pt
, whisper-tiny/
, RIFE/flownet.pkl
) and SVD checkpoints (svd_xt.safetensors
or svd_xt_1_1.safetensors
).image_size
.Highlighted Details
duration
parameter, replacing the old 'frame number' option.Maintenance & Community
The project acknowledges a PR from @civen-cn. Further community or maintenance details are not provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is presented as a method for portrait animation and may be experimental. Users might encounter OOM errors, especially with larger image_size
settings, and are advised to reduce this value. The accuracy of duration control is noted as not being a precise percentage.
2 months ago
1 day