vits2  by daniilrobnikov

Unofficial VITS2 implementation for single-stage text-to-speech research

created 2 years ago
590 stars

Top 56.0% on sourcepulse

GitHubView on GitHub
Project Summary

VITS2 is a single-stage text-to-speech (TTS) model aiming to improve naturalness, efficiency, and reduce reliance on phoneme conversion compared to prior single-stage approaches. It targets researchers and developers working on TTS systems, offering a more end-to-end solution.

How It Works

VITS2 builds upon the VITS architecture, introducing architectural improvements and training mechanisms. Key enhancements include Normalizing Flows, a Duration Predictor, and an updated Text Encoder designed for speaker conditioning. These changes aim to synthesize more natural speech, improve multi-speaker similarity, and increase training/inference efficiency, while mitigating the strong dependence on phoneme conversion seen in earlier models.

Quick Start & Requirements

  • Installation: Clone the repository and set up a Conda environment with Python 3.11 and PyTorch 2.0. Install dependencies via pip install -r requirements.txt.
  • Prerequisites: Python 3.11, PyTorch 2.0, espeak-ng (for phonemization). Datasets (LJSpeech, VCTK, or custom) require preprocessing into mel-spectrograms.
  • Setup: Requires downloading and preprocessing datasets. Training examples are provided for LJ Speech, VCTK, and custom datasets.
  • Links: Demo, Paper.

Highlighted Details

  • Focuses on improving quality and efficiency of single-stage TTS.
  • Reduces dependence on phoneme conversion for a more end-to-end approach.
  • Incorporates Normalizing Flows and an improved Duration Predictor.
  • Supports multi-speaker TTS with speaker conditioning.

Maintenance & Community

This is an unofficial implementation. The project is a work in progress with a TODO list indicating planned features and improvements.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This is an unofficial implementation and is marked as a work in progress. Several features are still listed as "In progress" or "TODO" in the project's development roadmap.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
32 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.