Talking portrait generator research paper
Top 96.2% on sourcepulse
KDTalker addresses the challenge of generating realistic talking portraits driven by audio, focusing on achieving diverse poses and high accuracy efficiently. It is designed for researchers and developers in computer vision and AI, offering a novel approach to audio-driven animation with potential applications in virtual avatars, content creation, and accessibility tools.
How It Works
KDTalker employs an implicit keypoint-based spatiotemporal diffusion model. This approach leverages diffusion models for generating high-fidelity sequences, guided by extracted keypoints that capture both facial landmarks and head pose. The implicit representation allows for efficient handling of pose variations, while the spatiotemporal diffusion ensures coherence across frames, leading to more natural and diverse animations compared to methods relying solely on explicit motion transfer.
Quick Start & Requirements
pip install -r requirements.txt
../pretrained_weights
and ./ckpts
directories as specified.kdtalker.com
, and a Huggingface Space demo is provided for slower inference.python inference.py -source_image <path> -driven_audio <path> -output <path>
Highlighted Details
Maintenance & Community
The project is maintained by Chaolong Yang and collaborators from multiple institutions. The primary contact for inquiries is chaolong.yang@liverpool.ac.uk. The project acknowledges several other open-source works.
Licensing & Compatibility
Licensed under CC-BY-NC 4.0. This license restricts commercial use and requires attribution. For commercial use inquiries, direct contact with the authors is necessary.
Limitations & Caveats
The current release focuses on inference code and demo. Training code is planned for future release. The CC-BY-NC 4.0 license limits commercial applications.
4 days ago
Inactive