vits2_pytorch  by p0p4k

PyTorch implementation of the VITS2 text-to-speech model

created 2 years ago
533 stars

Top 60.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an unofficial PyTorch implementation of VITS2, a single-stage text-to-speech (TTS) model designed for improved naturalness, speech characteristic similarity, and computational efficiency. It targets researchers and developers seeking to build or fine-tune advanced TTS systems, offering a fully end-to-end approach that reduces reliance on external phoneme conversion.

How It Works

VITS2 enhances its predecessor by incorporating several architectural improvements. It features a transformer block within the normalizing flow for better sequence modeling, a speaker-conditioned text encoder for multi-speaker synthesis, and a duration predictor with adversarial loss and noise-scaled monotonic alignment search for more robust duration modeling. These components contribute to a more natural and efficient speech synthesis process.

Quick Start & Requirements

  • Install: Clone the repository and install requirements from requirements.txt.
  • Prerequisites: Python >= 3.10, PyTorch 1.13.1+, espeak.
  • Data: Download LJSpeech or VCTK datasets and create symbolic links. Preprocessing scripts are provided.
  • Training: Use train.py for single-speaker and train_ms.py for multi-speaker models, referencing configuration files.
  • ONNX Export: Scripts export_onnx.py and infer_onnx.py are available.
  • Links: Discussion Page for logs and community contributions.

Highlighted Details

  • Implements key VITS2 features: transformer normalizing flow, speaker-conditioned text encoder, and adversarial duration predictor.
  • Supports ONNX export for efficient inference.
  • Includes Gradio demo support.
  • Offers pretrained checkpoints for LJSpeech.

Maintenance & Community

The project is actively maintained by the author, with contributions and discussions welcomed via GitHub issues. Special mentions are given to contributors for feedback, guidance, and resource support.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The implementation is unofficial and may not perfectly mirror the original VITS2 paper's exact configurations or performance. Some advanced features might still be under development or require expert verification.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.