Lip-sync system for talking-head video editing (research paper)
Top 7.4% on sourcepulse
VideoReTalking is a system for editing talking head videos to achieve audio-driven lip synchronization and expression modification. It targets researchers and developers working on video editing and synthesis, enabling high-quality, lip-synced outputs even with different emotions.
How It Works
The system employs a three-stage pipeline: first, an expression editing network modifies facial expressions to a canonical form. Second, a lip-sync network synchronizes the video frames with input audio. Finally, an identity-aware face enhancement network and post-processing steps improve photo-realism. This sequential, learning-based approach allows for end-to-end processing without manual intervention.
Quick Start & Requirements
conda
and pip
. Requires Python 3.8 and CUDA 11.1../checkpoints
.python3 inference.py --face <input_video> --audio <input_audio> --outfile <output_video>
Highlighted Details
Maintenance & Community
The project lists authors from Xidian University and Tencent AI Lab. No specific community channels (Discord/Slack) or roadmap are provided in the README.
Licensing & Compatibility
The README states compliance with an "open-source license" and "intellectual property declaration" but does not specify the license type. It includes a disclaimer that it is not an official Tencent product and prohibits using Tencent names/logos without permission.
Limitations & Caveats
The DNet module cannot handle extreme poses. The project's disclaimer also warns against using the code for harmful activities or misrepresentation.
1 year ago
Inactive