xtts-api-server  by daswer123

FastAPI server for XTTSv2 text-to-speech

created 1 year ago
527 stars

Top 60.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a FastAPI server for the XTTSv2 text-to-speech model, targeting users who need a programmatic interface for voice generation, particularly those integrating with applications like SillyTavern. It offers a flexible way to leverage XTTSv2's capabilities, including voice cloning and multi-language support, with options for performance optimization.

How It Works

The server wraps the XTTSv2 model within a FastAPI framework, exposing endpoints for text-to-speech generation. It supports loading models locally or via an API, with options to specify model versions. Performance can be enhanced using --deepspeed for multi-GPU acceleration or --lowvram for reduced memory footprint. Streaming mode is available for near real-time audio output, with an improved version for complex languages.

Quick Start & Requirements

  • Install via pip: pip install xtts-api-server
  • GPU acceleration recommended: pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118 (CUDA 11.8 required).
  • Linux users may need sudo apt install -y python3-dev python3-venv portaudio19-dev.
  • Server launch: python -m xtts_api_server
  • Docker images are available.
  • API Docs: http://localhost:8020/docs

Highlighted Details

  • Supports voice cloning from provided audio samples.
  • Offers --deepspeed for 2-3x processing speedup.
  • --streaming-mode provides near real-time audio playback.
  • Can load custom XTTSv2 models from a local folder.

Maintenance & Community

The project acknowledges contributions from Kolja Beigel (RealtimeTTS), erew123, and lendot. The author notes limited time for active development, suggesting users explore similar projects.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Code usage is permitted for personal needs, and PRs are welcome.

Limitations & Caveats

Streaming mode has limitations, including only working locally and not supporting the tts_to_file endpoint. The author advises users to check a similar project for alternative XTTS implementations.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.