Bert-VITS2-ext by see2023

Extends Bert-VITS2 for facial expression and animation generation

Created 2 years ago

537 stars

Top 59.1% on SourcePulse

Project Summary

This project extends Bert-VITS2 for synchronized facial expression generation alongside Text-to-Speech (TTS), targeting users who need to create animated avatars or virtual characters. It aims to bridge the gap between speech and visual emotion, offering a more lifelike and expressive output.

How It Works

The core approach leverages the latent representations (z) from the Bert-VITS2 TTS model. A parallel LSTM and MLP network is trained to map these latent variables to facial expression parameters (e.g., Live Link Face values). This allows for the generation of expression data directly from the TTS process without altering the original VITS architecture, by adding a side-branch for expression prediction.

Quick Start & Requirements

Install: Primarily through Python package installation.
Prerequisites: Python, ffmpeg for audio processing. Specific versions are not strictly enforced but compatibility with the Bert-VITS2 backbone is implied. GPU is highly recommended for training and inference.
Resources: Training requires significant computational resources and a dataset of synchronized audio and facial motion data.
Docs: Quick start guide available in webui_preprocess.py. Demos available on Bilibili.

Highlighted Details

Integrates with Unreal Engine (UE) via OSC/VMC protocols for real-time animation preview.
Explores extensions to other TTS models like GPT-SoVITS and CosyVoice.
Includes experimental integration with MotionGPT for generating body animations based on text.
Demonstrates audio-to-photoreal capabilities for generating animation data.

Maintenance & Community

The project is actively developed by "see2023" and references the "anyvoiceai/MassTTS" project as a key inspiration. Community interaction points are not explicitly listed, but the project is open-source.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README. However, it is based on Bert-VITS2, which itself has a permissive license. Compatibility with commercial or closed-source applications would require verification of the underlying Bert-VITS2 license and any modifications made.

Limitations & Caveats

The project is experimental, with noted issues in direct retraining on GPT-SoVITS and potential quality degradation. Motion generation from MotionGPT has limitations with non-English text and requires manual mapping for UE compatibility. Audio and expression synchronization may have offsets that require manual tuning.

Health Check

Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days