Bert-VITS2 by fishaudio

VITS2 backbone for multilingual text-to-speech

Created 2 years ago

8,692 stars

Top 5.9% on SourcePulse

Project Summary

Bert-VITS2 is a text-to-speech (TTS) system that integrates a VITS2 backbone with multilingual BERT for enhanced voice synthesis. It is designed for advanced users and researchers interested in TTS technology, offering a foundation for custom voice model training and experimentation.

How It Works

This project builds upon the VITS2 architecture, incorporating multilingual BERT embeddings to improve prosody and naturalness in speech generation. The core idea is to leverage BERT's contextual understanding of text to inform the VITS2 model, leading to more expressive and human-like synthesized speech.

Quick Start & Requirements

Install: Refer to webui_preprocess.py for guidance.
Prerequisites: Python, PyTorch. Specific version requirements are not detailed in the README.
Resources: Training custom models will likely require significant GPU resources and time.
Links:
- Demo Video: https://www.bilibili.com/video/BV18E421371Q
- Tech Slides Video: https://www.bilibili.com/video/BV1zJ4m1K7cj
- UI Project: https://github.com/jiangyuxiaoxiao/Bert-VITS2-UI

Highlighted Details

Core ideas are inspired by MassTTS and VITS.
Aims for state-of-the-art open-source TTS quality.
Includes references to related projects like fish-speech and so-vits-svc.

Maintenance & Community

The project states it will no longer be actively maintained, recommending FishAudio's Fish-Speech as a successor.

Licensing & Compatibility

The README does not specify a license. Given the project's nature and references, it's likely intended for research and non-commercial use. Users should exercise caution regarding commercial applications.

Limitations & Caveats

The project is no longer actively maintained. The README explicitly warns against using the project for any illegal purposes, particularly those violating Chinese laws, and prohibits political use.

Bert-VITS2 by fishaudio

Explore Similar Projects

LLaSM by LinkSoul-AI

SONAR by facebookresearch

ComfyUI-Qwen-TTS by flybirdxx

vits-simple-api by Artrajz

Multilingual_Text_to_Speech by Tomiinek

xtts-webui by daswer123

speech-to-speech by huggingface

voice-pro by abus-aikorea

VITS-fast-fine-tuning by Plachtaa

Qwen3-TTS by QwenLM

seamless_communication by facebookresearch

fish-speech by fishaudio