vits2 by daniilrobnikov

Unofficial VITS2 implementation for single-stage text-to-speech research

Created 2 years ago

629 stars

Top 52.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Project Summary

VITS2 is a single-stage text-to-speech (TTS) model aiming to improve naturalness, efficiency, and reduce reliance on phoneme conversion compared to prior single-stage approaches. It targets researchers and developers working on TTS systems, offering a more end-to-end solution.

How It Works

VITS2 builds upon the VITS architecture, introducing architectural improvements and training mechanisms. Key enhancements include Normalizing Flows, a Duration Predictor, and an updated Text Encoder designed for speaker conditioning. These changes aim to synthesize more natural speech, improve multi-speaker similarity, and increase training/inference efficiency, while mitigating the strong dependence on phoneme conversion seen in earlier models.

Quick Start & Requirements

Installation: Clone the repository and set up a Conda environment with Python 3.11 and PyTorch 2.0. Install dependencies via pip install -r requirements.txt.
Prerequisites: Python 3.11, PyTorch 2.0, espeak-ng (for phonemization). Datasets (LJSpeech, VCTK, or custom) require preprocessing into mel-spectrograms.
Setup: Requires downloading and preprocessing datasets. Training examples are provided for LJ Speech, VCTK, and custom datasets.
Links: Demo, Paper.

Highlighted Details

Focuses on improving quality and efficiency of single-stage TTS.
Reduces dependence on phoneme conversion for a more end-to-end approach.
Incorporates Normalizing Flows and an improved Duration Predictor.
Supports multi-speaker TTS with speaker conditioning.

Maintenance & Community

This is an unofficial implementation. The project is a work in progress with a TODO list indicating planned features and improvements.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This is an unofficial implementation and is marked as a work in progress. Several features are still listed as "In progress" or "TODO" in the project's development roadmap.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days