openedai-speech  by matatonic

OpenAI API-compatible server for text-to-speech

created 1 year ago
797 stars

Top 45.1% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an OpenAI API-compatible text-to-speech server, enabling users to run their own private TTS service. It supports both fast CPU-based generation via Piper TTS and high-quality, voice-cloning capabilities using Coqui AI's XTTS v2, targeting developers and users who need a self-hosted, flexible TTS solution.

How It Works

The server exposes an endpoint mirroring OpenAI's /v1/audio/speech API. It leverages Piper TTS for rapid, CPU-bound speech synthesis, allowing for custom voice mapping. For higher fidelity and voice cloning, it integrates Coqui XTTS v2, which requires a GPU with approximately 4GB VRAM. XTTS v2 offers multilingual support with automatic language detection and the ability to use custom fine-tuned models.

Quick Start & Requirements

  • Install/Run: Docker Compose is recommended.
    • Nvidia GPU: docker compose up
    • AMD GPU (ROCm): docker compose -f docker-compose.rocm.yml up
    • CPU only (Piper): docker compose -f docker-compose.min.yml up
  • Prerequisites: Python 3.9-3.11 (Piper not compatible with 3.12), curl, ffmpeg. Nvidia GPU with CUDA or AMD GPU with ROCm for respective backends.
  • Setup: Initial voice model downloads may take time.
  • Docs: OpenAI Text to speech guide, Custom Voices Howto, Piper

Highlighted Details

  • Supports OpenAI's tts-1 and tts-1-hd models with configurable voices (alloy, echo, fable, etc.).
  • Outputs in mp3, opus, aac, flac, and pcm formats with adjustable speech speed.
  • Features streamed audio output during generation and an optional idle unload timer for models.
  • Allows custom voice cloning with as little as 6 seconds of clear audio and supports custom fine-tuned XTTS models.

Maintenance & Community

  • Notice: The project states it is "mostly obsolete and will no longer be updated."
  • Recent activity (August 2024) includes Docker build fixes and refactoring.
  • Alternatives are listed in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The project is marked as "mostly obsolete and will no longer be updated."
  • Piper TTS does not install on Python 3.12.
  • XTTS on ARM64 (Apple M-series, Raspberry Pi) only has CPU support, which is noted as "very slow."
  • Simultaneous streams with XTTS may lead to audio underruns if exceeding 2-3 streams.
Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
56 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.