Orpheus-FastAPI  by Lex-au

Text-to-speech server with OpenAI-compatible API and web UI

created 4 months ago
485 stars

Top 64.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a high-performance Text-to-Speech (TTS) server with an OpenAI-compatible API, targeting developers and users seeking to integrate advanced TTS capabilities into applications or chatbots. It offers multilingual support, emotion tags, and a modern web UI, optimized for RTX GPUs.

How It Works

Orpheus-FastAPI acts as a frontend that interfaces with an external LLM inference server running the Orpheus model. It sends text prompts to the inference server, which generates tokens. These tokens are then processed by the SNAC model to produce audio. The system is optimized for RTX GPUs using vectorised tensor operations, parallel processing with CUDA streams, efficient memory management, and intelligent batching. It also features automatic hardware detection to adapt performance settings for different GPU and CPU configurations.

Quick Start & Requirements

  • Docker Compose (GPU): docker compose -f docker-compose-gpu.yml up
  • Docker Compose (CPU): docker compose -f docker-compose-cpu.yml up
  • Native Install: Requires Python 3.8-3.11, CUDA-compatible GPU (RTX recommended), PyTorch with CUDA support, and llama.cpp or compatible inference server.
  • Setup: Docker Compose simplifies setup by orchestrating the FastAPI server and inference server. Native installation involves cloning the repo, setting up a virtual environment, installing dependencies, and running python app.py.
  • Docs: OpenAI API Compatible Endpoint

Highlighted Details

  • OpenAI API compatible /v1/audio/speech endpoint.
  • Supports 24 voices across 8 languages with emotion tags (e.g., <laugh>).
  • Handles unlimited audio length via sentence-based batching and crossfade stitching.
  • Offers quantized models (Q2_K, Q4_K_M, Q8_0) for improved inference speed.
  • Integrates with OpenWebUI for chatbot voice capabilities.

Maintenance & Community

The project has recent updates (v1.3.0 as of April 2025) including new languages, voices, and Docker Compose support. Contributions are acknowledged.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and linking with closed-source applications.

Limitations & Caveats

Python 3.12 is not supported due to dependency issues. While long-form audio is supported, slight discontinuities between segments may occur due to architectural constraints of the underlying model. The repetition penalty is hardcoded to 1.1.

Health Check
Last commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
12
Star History
173 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.