Text-to-speech server with OpenAI-compatible API and web UI
Top 64.2% on sourcepulse
This project provides a high-performance Text-to-Speech (TTS) server with an OpenAI-compatible API, targeting developers and users seeking to integrate advanced TTS capabilities into applications or chatbots. It offers multilingual support, emotion tags, and a modern web UI, optimized for RTX GPUs.
How It Works
Orpheus-FastAPI acts as a frontend that interfaces with an external LLM inference server running the Orpheus model. It sends text prompts to the inference server, which generates tokens. These tokens are then processed by the SNAC model to produce audio. The system is optimized for RTX GPUs using vectorised tensor operations, parallel processing with CUDA streams, efficient memory management, and intelligent batching. It also features automatic hardware detection to adapt performance settings for different GPU and CPU configurations.
Quick Start & Requirements
docker compose -f docker-compose-gpu.yml up
docker compose -f docker-compose-cpu.yml up
llama.cpp
or compatible inference server.python app.py
.Highlighted Details
/v1/audio/speech
endpoint.<laugh>
).Maintenance & Community
The project has recent updates (v1.3.0 as of April 2025) including new languages, voices, and Docker Compose support. Contributions are acknowledged.
Licensing & Compatibility
Licensed under the Apache License 2.0, permitting commercial use and linking with closed-source applications.
Limitations & Caveats
Python 3.12 is not supported due to dependency issues. While long-form audio is supported, slight discontinuities between segments may occur due to architectural constraints of the underlying model. The repetition penalty is hardcoded to 1.1.
4 weeks ago
Inactive