Sonic  by jixiaozhong

Research paper implementation for audio-driven portrait animation

created 8 months ago
2,955 stars

Top 16.5% on sourcepulse

GitHubView on GitHub
Project Summary

Sonic provides an official implementation for a CVPR 2025 paper focused on portrait animation driven by audio, aiming to improve global audio perception. It's designed for researchers and developers in AI-driven media generation, offering a novel approach to audio-visual synchronization for realistic character animation.

How It Works

Sonic leverages a multi-stage pipeline that integrates audio analysis with video generation. Key components include audio-to-bucket mapping, an audio-to-token converter, and a Stable Video Diffusion model for visual synthesis. This approach allows for a more nuanced understanding and application of audio cues to animate facial expressions and movements, moving beyond simple lip-sync to capture broader emotional and prosodic elements.

Quick Start & Requirements

  • Installation: pip3 install -r requirements.txt
  • Prerequisites: NVIDIA GPU with CUDA support (tested on a single 32GB GPU), Linux OS.
  • Model Downloads: Requires downloading checkpoints for Sonic, Stable Video Diffusion, and Whisper via huggingface-cli or provided Google Drive links.
  • Demo: Run inference with python3 demo.py <input_image> <input_audio> <output_video>.
  • Resources: Links to online demos and a YouTube tutorial are available.

Highlighted Details

  • Accepted to CVPR 2025.
  • Offers online demos and a Hugging Face space for interaction.
  • A ComfyUI version is available via community contribution.
  • Inference code and weights were released in January 2025.

Maintenance & Community

The project is actively maintained, with inference code and weights released recently. Community contributions are encouraged, with a ComfyUI version already integrated. A QQ chat group is available for community interaction.

Licensing & Compatibility

The project is licensed for non-commercial use only. Commercialization requires using Tencent Cloud Video Creation Large Model.

Limitations & Caveats

The primary limitation is the non-commercial use restriction. The model has been tested on Linux with a specific GPU configuration, and compatibility with other operating systems or hardware setups is not guaranteed.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
347 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.