FastAPI server for XTTSv2 text-to-speech
Top 60.8% on sourcepulse
This project provides a FastAPI server for the XTTSv2 text-to-speech model, targeting users who need a programmatic interface for voice generation, particularly those integrating with applications like SillyTavern. It offers a flexible way to leverage XTTSv2's capabilities, including voice cloning and multi-language support, with options for performance optimization.
How It Works
The server wraps the XTTSv2 model within a FastAPI framework, exposing endpoints for text-to-speech generation. It supports loading models locally or via an API, with options to specify model versions. Performance can be enhanced using --deepspeed
for multi-GPU acceleration or --lowvram
for reduced memory footprint. Streaming mode is available for near real-time audio output, with an improved version for complex languages.
Quick Start & Requirements
pip install xtts-api-server
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118
(CUDA 11.8 required).sudo apt install -y python3-dev python3-venv portaudio19-dev
.python -m xtts_api_server
http://localhost:8020/docs
Highlighted Details
--deepspeed
for 2-3x processing speedup.--streaming-mode
provides near real-time audio playback.Maintenance & Community
The project acknowledges contributions from Kolja Beigel (RealtimeTTS), erew123, and lendot. The author notes limited time for active development, suggesting users explore similar projects.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Code usage is permitted for personal needs, and PRs are welcome.
Limitations & Caveats
Streaming mode has limitations, including only working locally and not supporting the tts_to_file
endpoint. The author advises users to check a similar project for alternative XTTS implementations.
1 year ago
Inactive