VITS-fast-fine-tuning  by Plachtaa

VITS pipeline for fast speaker adaptation TTS and voice conversion

created 2 years ago
4,957 stars

Top 10.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a fast fine-tuning pipeline for the VITS Text-to-Speech (TTS) model, enabling rapid speaker adaptation for both TTS synthesis and many-to-many voice conversion. It targets users who want to quickly integrate custom voices into existing VITS models, supporting cloning from short or long audio, and even video sources.

How It Works

The project leverages VITS, a Variational Inference with adversarial learning for end-to-end Text-to-Speech, and focuses on efficient fine-tuning. It allows users to adapt pre-trained models with their own voice data, enabling the model to synthesize speech in new voices or perform voice conversion between any supported speakers. The approach prioritizes speed and ease of use for speaker cloning.

Quick Start & Requirements

  • Install: Local training requires pip install -r requirements.txt and building monotonic_align. Google Colab is also supported.
  • Prerequisites: Python 3.x, ffmpeg (for voice conversion).
  • Setup: Fine-tuning can take 20 minutes to 2 hours depending on data size. Inference is Windows-only via an executable.
  • Docs: LOCAL.md for local training, DATA.MD for data preparation.

Highlighted Details

  • Supports cloning voices from 10+ short audio clips, long audio (>= 3 min), videos (>= 3 min), or Bilibili video links.
  • Enables many-to-many voice conversion between any speakers added to the model.
  • Offers TTS synthesis in English, Japanese, and Chinese with custom and preset characters.
  • Inference is available via a Windows executable or command-line interface.

Maintenance & Community

  • Active development with contributions from multiple authors.
  • Community support available via Discord server and GitHub Issues.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Inference is currently limited to Windows. The README does not specify the exact license, which may impact commercial use.

Health Check
Last commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
78 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.