Orpheus-FastAPI by Lex-au

Text-to-speech server with OpenAI-compatible API and web UI

Created 9 months ago

634 stars

Top 52.3% on SourcePulse

Project Summary

This project provides a high-performance Text-to-Speech (TTS) server with an OpenAI-compatible API, targeting developers and users seeking to integrate advanced TTS capabilities into applications or chatbots. It offers multilingual support, emotion tags, and a modern web UI, optimized for RTX GPUs.

How It Works

Orpheus-FastAPI acts as a frontend that interfaces with an external LLM inference server running the Orpheus model. It sends text prompts to the inference server, which generates tokens. These tokens are then processed by the SNAC model to produce audio. The system is optimized for RTX GPUs using vectorised tensor operations, parallel processing with CUDA streams, efficient memory management, and intelligent batching. It also features automatic hardware detection to adapt performance settings for different GPU and CPU configurations.

Quick Start & Requirements

Docker Compose (GPU): docker compose -f docker-compose-gpu.yml up
Docker Compose (CPU): docker compose -f docker-compose-cpu.yml up
Native Install: Requires Python 3.8-3.11, CUDA-compatible GPU (RTX recommended), PyTorch with CUDA support, and llama.cpp or compatible inference server.
Setup: Docker Compose simplifies setup by orchestrating the FastAPI server and inference server. Native installation involves cloning the repo, setting up a virtual environment, installing dependencies, and running python app.py.
Docs: OpenAI API Compatible Endpoint

Highlighted Details

OpenAI API compatible /v1/audio/speech endpoint.
Supports 24 voices across 8 languages with emotion tags (e.g., <laugh>).
Handles unlimited audio length via sentence-based batching and crossfade stitching.
Offers quantized models (Q2_K, Q4_K_M, Q8_0) for improved inference speed.
Integrates with OpenWebUI for chatbot voice capabilities.

Maintenance & Community

The project has recent updates (v1.3.0 as of April 2025) including new languages, voices, and Docker Compose support. Contributions are acknowledged.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and linking with closed-source applications.

Limitations & Caveats

Python 3.12 is not supported due to dependency issues. While long-form audio is supported, slight discontinuities between segments may occur due to architectural constraints of the underlying model. The repetition penalty is hardcoded to 1.1.

Orpheus-FastAPI by Lex-au

Explore Similar Projects

Auralis by astramind-ai

TheWhisper by TheStageAI

insanely-fast-whisper-api by JigsawStack

obs-localvocal by royshil

ttsfm by dbccccccc

Chatterbox-TTS-Server by devnen

RealtimeVoiceChat by KoljaB

stt by jianchang512

whisper-asr-webservice by ahmetoner

Kokoro-FastAPI by remsky

Whisper by Const-me

tortoise-tts by neonbjb