xtts2-ui  by BoltzmannEntropy

UI for text-based voice cloning using a 10-second audio sample

created 1 year ago
356 stars

Top 79.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a user-friendly interface for XTTS-2, a text-to-speech model capable of voice cloning with as little as 10 seconds of audio. It targets users who need to generate synthetic speech in multiple languages with custom voices, offering a simplified workflow compared to direct model interaction.

How It Works

The UI leverages the XTTS-2 model, specifically tts_models/multilingual/multi-dataset/xtts_v2, to perform voice cloning. Users can upload or record a short audio sample (around 10 seconds) of the target voice and provide text input. The system then synthesizes speech in the target voice, supporting 16 languages.

Quick Start & Requirements

  • Install: Clone the repository, create a Python virtual environment, and install dependencies using pip install -r requirements.txt. Upgrade the TTS package with pip install --upgrade TTS.
  • PyTorch: Requires PyTorch installation, with specific commands provided for CUDA 12.1 (cu121) and CUDA 11.8 (cu118) compatible GPUs. Users without a GPU should follow PyTorch's official instructions.
  • Japanese Support: Requires installing fugashi and potentially downloading the unidic dictionary.
  • Setup Time: Estimated to be under 30 minutes, depending on PyTorch and model download times.
  • Docs: XTTS-v2 on Hugging Face

Highlighted Details

  • Supports voice cloning with just 10 seconds of audio.
  • Works in 16 languages, including Arabic, Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish.
  • Includes built-in voice recording and uploading capabilities.
  • Models are downloaded automatically on first use.

Maintenance & Community

The project is based on kanttouchthis/text_generation_webui_xtts. Further community and roadmap information is not explicitly detailed in the README.

Licensing & Compatibility

The README does not explicitly state the license for this UI project. However, it references the XTTS-v2 model, which is subject to Coqui's Commercial Product License Agreement (CPML), accessible at https://coqui.ai/cpml.txt. Users must agree to these terms.

Limitations & Caveats

The README notes that the quality is not "EL level" and may not meet all expectations. Users must explicitly agree to the terms of service to use the XTTS model. There's a mention of potential model re-downloading issues, referencing GitHub Issue 4723.

Health Check
Last commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
19 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.