Sonic by jixiaozhong

Research paper implementation for audio-driven portrait animation

Created 1 year ago

3,170 stars

Top 15.0% on SourcePulse

Project Summary

Sonic provides an official implementation for a CVPR 2025 paper focused on portrait animation driven by audio, aiming to improve global audio perception. It's designed for researchers and developers in AI-driven media generation, offering a novel approach to audio-visual synchronization for realistic character animation.

How It Works

Sonic leverages a multi-stage pipeline that integrates audio analysis with video generation. Key components include audio-to-bucket mapping, an audio-to-token converter, and a Stable Video Diffusion model for visual synthesis. This approach allows for a more nuanced understanding and application of audio cues to animate facial expressions and movements, moving beyond simple lip-sync to capture broader emotional and prosodic elements.

Quick Start & Requirements

Installation: pip3 install -r requirements.txt
Prerequisites: NVIDIA GPU with CUDA support (tested on a single 32GB GPU), Linux OS.
Model Downloads: Requires downloading checkpoints for Sonic, Stable Video Diffusion, and Whisper via huggingface-cli or provided Google Drive links.
Demo: Run inference with python3 demo.py <input_image> <input_audio> <output_video>.
Resources: Links to online demos and a YouTube tutorial are available.

Highlighted Details

Accepted to CVPR 2025.
Offers online demos and a Hugging Face space for interaction.
A ComfyUI version is available via community contribution.
Inference code and weights were released in January 2025.

Maintenance & Community

The project is actively maintained, with inference code and weights released recently. Community contributions are encouraged, with a ComfyUI version already integrated. A QQ chat group is available for community interaction.

Licensing & Compatibility

The project is licensed for non-commercial use only. Commercialization requires using Tencent Cloud Video Creation Large Model.

Limitations & Caveats

The primary limitation is the non-commercial use restriction. The model has been tested on Linux with a specific GPU configuration, and compatibility with other operating systems or hardware setups is not guaranteed.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

40 stars in the last 30 days