openedai-speech by matatonic

OpenAI API-compatible server for text-to-speech

Created 2 years ago

842 stars

Top 42.3% on SourcePulse

Project Summary

This project provides an OpenAI API-compatible text-to-speech server, enabling users to run their own private TTS service. It supports both fast CPU-based generation via Piper TTS and high-quality, voice-cloning capabilities using Coqui AI's XTTS v2, targeting developers and users who need a self-hosted, flexible TTS solution.

How It Works

The server exposes an endpoint mirroring OpenAI's /v1/audio/speech API. It leverages Piper TTS for rapid, CPU-bound speech synthesis, allowing for custom voice mapping. For higher fidelity and voice cloning, it integrates Coqui XTTS v2, which requires a GPU with approximately 4GB VRAM. XTTS v2 offers multilingual support with automatic language detection and the ability to use custom fine-tuned models.

Quick Start & Requirements

Install/Run: Docker Compose is recommended.
- Nvidia GPU: docker compose up
- AMD GPU (ROCm): docker compose -f docker-compose.rocm.yml up
- CPU only (Piper): docker compose -f docker-compose.min.yml up
Prerequisites: Python 3.9-3.11 (Piper not compatible with 3.12), curl, ffmpeg. Nvidia GPU with CUDA or AMD GPU with ROCm for respective backends.
Setup: Initial voice model downloads may take time.
Docs: OpenAI Text to speech guide, Custom Voices Howto, Piper

Highlighted Details

Supports OpenAI's tts-1 and tts-1-hd models with configurable voices (alloy, echo, fable, etc.).
Outputs in mp3, opus, aac, flac, and pcm formats with adjustable speech speed.
Features streamed audio output during generation and an optional idle unload timer for models.
Allows custom voice cloning with as little as 6 seconds of clear audio and supports custom fine-tuned XTTS models.

Maintenance & Community

Notice: The project states it is "mostly obsolete and will no longer be updated."
Recent activity (August 2024) includes Docker build fixes and refactoring.
Alternatives are listed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is marked as "mostly obsolete and will no longer be updated."
Piper TTS does not install on Python 3.12.
XTTS on ARM64 (Apple M-series, Raspberry Pi) only has CPU support, which is noted as "very slow."
Simultaneous streams with XTTS may lead to audio underruns if exceeding 2-3 streams.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

8 stars in the last 30 days

Explore Similar Projects

cosyvoice-api by jianchang512

API for text-to-speech using CosyVoice

Created 1 year ago

Updated 4 months ago

Auralis by astramind-ai

TTS engine for fast voice cloning

Created 1 year ago

Updated 11 months ago

xtts2-ui by BoltzmannEntropy

UI for text-based voice cloning using a 10-second audio sample

Created 2 years ago

Updated 1 year ago

Open-VoiceCanvas by ItusiAI

Open-source text-to-speech (TTS) platform with Stripe payment support

Created 10 months ago

Updated 3 days ago

tts by zuoban

TTS service for voice synthesis using Microsoft Azure

Created 1 year ago

Updated 2 weeks ago

xtts-api-server by daswer123

FastAPI server for XTTSv2 text-to-speech

Created 2 years ago

Updated 1 year ago

xtts-webui by daswer123

WebUI for XTTS, a text-to-speech model, and fine-tuning

Created 2 years ago

Updated 11 months ago

Speech-AI-Forge by lenML

TTS API server and Gradio WebUI

Created 1 year ago

Updated 3 months ago

alltalk_tts by erew123

Text-to-speech tool based on Coqui TTS engine

Created 2 years ago

Updated 2 days ago

easyVoice by cosin2077

Text-to-speech tool for long texts and multi-character dubbing

Created 10 months ago

Updated 8 months ago

tts by wangwangit

AI platform for seamless voice and text processing

Created 5 months ago

Updated 4 months ago

Starred by

Abubakar Abid

Abubakar Abid(Cofounder of Gradio).

voice-pro by abus-aikorea

WebUI for speech recognition, translation, and dubbing

Created 1 year ago

Updated 1 month ago

Feedback? Help us improve.