xtts2-ui by BoltzmannEntropy

UI for text-based voice cloning using a 10-second audio sample

Created 2 years ago

384 stars

Top 74.5% on SourcePulse

Project Summary

This project provides a user-friendly interface for XTTS-2, a text-to-speech model capable of voice cloning with as little as 10 seconds of audio. It targets users who need to generate synthetic speech in multiple languages with custom voices, offering a simplified workflow compared to direct model interaction.

How It Works

The UI leverages the XTTS-2 model, specifically tts_models/multilingual/multi-dataset/xtts_v2, to perform voice cloning. Users can upload or record a short audio sample (around 10 seconds) of the target voice and provide text input. The system then synthesizes speech in the target voice, supporting 16 languages.

Quick Start & Requirements

Install: Clone the repository, create a Python virtual environment, and install dependencies using pip install -r requirements.txt. Upgrade the TTS package with pip install --upgrade TTS.
PyTorch: Requires PyTorch installation, with specific commands provided for CUDA 12.1 (cu121) and CUDA 11.8 (cu118) compatible GPUs. Users without a GPU should follow PyTorch's official instructions.
Japanese Support: Requires installing fugashi and potentially downloading the unidic dictionary.
Setup Time: Estimated to be under 30 minutes, depending on PyTorch and model download times.
Docs: XTTS-v2 on Hugging Face

Highlighted Details

Supports voice cloning with just 10 seconds of audio.
Works in 16 languages, including Arabic, Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish.
Includes built-in voice recording and uploading capabilities.
Models are downloaded automatically on first use.

Maintenance & Community

The project is based on kanttouchthis/text_generation_webui_xtts. Further community and roadmap information is not explicitly detailed in the README.

Licensing & Compatibility

The README does not explicitly state the license for this UI project. However, it references the XTTS-v2 model, which is subject to Coqui's Commercial Product License Agreement (CPML), accessible at https://coqui.ai/cpml.txt. Users must agree to these terms.

Limitations & Caveats

The README notes that the quality is not "EL level" and may not meet all expectations. Users must explicitly agree to the terms of service to use the XTTS model. There's a mention of potential model re-downloading issues, referencing GitHub Issue 4723.

xtts2-ui by BoltzmannEntropy

Explore Similar Projects

cosyvoice-api by jianchang512

S.A.T.U.R.D.A.Y by GRVYDEV

ollama-voice-mac by apeatling

sesame_csm_openai by phildougherty

Easy-Voice-Toolkit by Spr-Aachen

xtts-webui by daswer123

whisper_mic by mallorbc

voice-assistant by linyiLYi

WhisperSpeech by WhisperSpeech

alltalk_tts by erew123

CosyVoice by FunAudioLLM

GPT-SoVITS by RVC-Boss