Research paper implementation for audio-driven portrait animation
Top 16.5% on sourcepulse
Sonic provides an official implementation for a CVPR 2025 paper focused on portrait animation driven by audio, aiming to improve global audio perception. It's designed for researchers and developers in AI-driven media generation, offering a novel approach to audio-visual synchronization for realistic character animation.
How It Works
Sonic leverages a multi-stage pipeline that integrates audio analysis with video generation. Key components include audio-to-bucket mapping, an audio-to-token converter, and a Stable Video Diffusion model for visual synthesis. This approach allows for a more nuanced understanding and application of audio cues to animate facial expressions and movements, moving beyond simple lip-sync to capture broader emotional and prosodic elements.
Quick Start & Requirements
pip3 install -r requirements.txt
huggingface-cli
or provided Google Drive links.python3 demo.py <input_image> <input_audio> <output_video>
.Highlighted Details
Maintenance & Community
The project is actively maintained, with inference code and weights released recently. Community contributions are encouraged, with a ComfyUI version already integrated. A QQ chat group is available for community interaction.
Licensing & Compatibility
The project is licensed for non-commercial use only. Commercialization requires using Tencent Cloud Video Creation Large Model.
Limitations & Caveats
The primary limitation is the non-commercial use restriction. The model has been tested on Linux with a specific GPU configuration, and compatibility with other operating systems or hardware setups is not guaranteed.
1 month ago
1 day