Lip-to-speech synthesis for generating speech from lip movements
Top 49.4% on sourcepulse
This repository provides code for Lip2Wav, a system that generates intelligible speech from lip movements in unconstrained video settings. It is targeted at researchers and developers in audio-visual speech processing and aims to enable high-quality, style-accurate lip-to-speech synthesis.
How It Works
Lip2Wav employs a sequence-to-sequence modeling approach to map visual lip movements directly to speech. It leverages pre-trained models and provides complete training and inference code, allowing for the generation of speech from video inputs. The system is designed to capture individual speaking styles for more accurate synthesis.
Quick Start & Requirements
pip install -r requirements.txt
s3fd.pth
).download_speaker.sh
script can be used to fetch video data. Preprocessing involves running python preprocess.py
. Inference is initiated with python complete_test_generate.py
.Highlighted Details
Maintenance & Community
The project is associated with CVPR 2020. Further information on community or ongoing maintenance is not detailed in the README.
Licensing & Compatibility
Limitations & Caveats
The code is tested with Python 3.7.4, and compatibility with newer Python versions is not guaranteed. The README points to a separate repository, Wav2Lip, for lip-sync talking face videos, indicating a potential divergence in focus or development.
2 years ago
Inactive