Extends Bert-VITS2 for facial expression and animation generation
Top 60.4% on sourcepulse
This project extends Bert-VITS2 for synchronized facial expression generation alongside Text-to-Speech (TTS), targeting users who need to create animated avatars or virtual characters. It aims to bridge the gap between speech and visual emotion, offering a more lifelike and expressive output.
How It Works
The core approach leverages the latent representations (z) from the Bert-VITS2 TTS model. A parallel LSTM and MLP network is trained to map these latent variables to facial expression parameters (e.g., Live Link Face values). This allows for the generation of expression data directly from the TTS process without altering the original VITS architecture, by adding a side-branch for expression prediction.
Quick Start & Requirements
webui_preprocess.py
. Demos available on Bilibili.Highlighted Details
Maintenance & Community
The project is actively developed by "see2023" and references the "anyvoiceai/MassTTS" project as a key inspiration. Community interaction points are not explicitly listed, but the project is open-source.
Licensing & Compatibility
The project's licensing is not explicitly stated in the README. However, it is based on Bert-VITS2, which itself has a permissive license. Compatibility with commercial or closed-source applications would require verification of the underlying Bert-VITS2 license and any modifications made.
Limitations & Caveats
The project is experimental, with noted issues in direct retraining on GPT-SoVITS and potential quality degradation. Motion generation from MotionGPT has limitations with non-English text and requires manual mapping for UE compatibility. Audio and expression synchronization may have offsets that require manual tuning.
5 months ago
1 day