OpenAI API-compatible server for text-to-speech
Top 45.1% on sourcepulse
This project provides an OpenAI API-compatible text-to-speech server, enabling users to run their own private TTS service. It supports both fast CPU-based generation via Piper TTS and high-quality, voice-cloning capabilities using Coqui AI's XTTS v2, targeting developers and users who need a self-hosted, flexible TTS solution.
How It Works
The server exposes an endpoint mirroring OpenAI's /v1/audio/speech
API. It leverages Piper TTS for rapid, CPU-bound speech synthesis, allowing for custom voice mapping. For higher fidelity and voice cloning, it integrates Coqui XTTS v2, which requires a GPU with approximately 4GB VRAM. XTTS v2 offers multilingual support with automatic language detection and the ability to use custom fine-tuned models.
Quick Start & Requirements
docker compose up
docker compose -f docker-compose.rocm.yml up
docker compose -f docker-compose.min.yml up
curl
, ffmpeg
. Nvidia GPU with CUDA or AMD GPU with ROCm for respective backends.Highlighted Details
tts-1
and tts-1-hd
models with configurable voices (alloy, echo, fable, etc.).mp3
, opus
, aac
, flac
, and pcm
formats with adjustable speech speed.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
6 months ago
1 day