KDTalker by chaolongy

Talking portrait generator research paper

Created 10 months ago

282 stars

Top 92.6% on SourcePulse

Project Summary

KDTalker addresses the challenge of generating realistic talking portraits driven by audio, focusing on achieving diverse poses and high accuracy efficiently. It is designed for researchers and developers in computer vision and AI, offering a novel approach to audio-driven animation with potential applications in virtual avatars, content creation, and accessibility tools.

How It Works

KDTalker employs an implicit keypoint-based spatiotemporal diffusion model. This approach leverages diffusion models for generating high-fidelity sequences, guided by extracted keypoints that capture both facial landmarks and head pose. The implicit representation allows for efficient handling of pose variations, while the spatiotemporal diffusion ensures coherence across frames, leading to more natural and diverse animations compared to methods relying solely on explicit motion transfer.

Quick Start & Requirements

Install: Clone the repository and set up a conda environment with Python 3.9. Install PyTorch 2.3.0 with CUDA 11.8 and then run pip install -r requirements.txt.
Prerequisites: Git, Conda, FFmpeg. Requires an RTX4090 or RTX3090 GPU.
Pretrained Weights: Download weights from Google Drive or Huggingface and organize them into ./pretrained_weights and ./ckpts directories as specified.
Demo: A local demo is available at kdtalker.com, and a Huggingface Space demo is provided for slower inference.
Inference: python inference.py -source_image <path> -driven_audio <path> -output <path>

Highlighted Details

Trained on 4,282 video clips from VoxCeleb.
Achieves accurate and efficient implicit keypoint-based spatiotemporal diffusion.
Enables diverse pose generation for audio-driven talking portraits.

Maintenance & Community

The project is maintained by Chaolong Yang and collaborators from multiple institutions. The primary contact for inquiries is chaolong.yang@liverpool.ac.uk. The project acknowledges several other open-source works.

Licensing & Compatibility

Licensed under CC-BY-NC 4.0. This license restricts commercial use and requires attribution. For commercial use inquiries, direct contact with the authors is necessary.

Limitations & Caveats

The current release focuses on inference code and demo. Training code is planned for future release. The CC-BY-NC 4.0 license limits commercial applications.

Health Check

Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days