chatterbox-tts-api by travisvn

OpenAI-compatible TTS API with voice cloning

Created 5 months ago

359 stars

Top 77.9% on SourcePulse

Project Summary

This project provides a local, OpenAI-compatible Text-to-Speech (TTS) API powered by FastAPI and Chatterbox TTS. It enables voice cloning and offers features like a React frontend, extensive configuration, and real-time status monitoring. The primary benefit is providing a self-hosted, high-quality TTS solution that seamlessly integrates with existing OpenAI API-compatible applications, offering greater control and privacy.

How It Works

The API leverages the Chatterbox TTS model to generate speech from text. It exposes endpoints that mimic the OpenAI TTS API, allowing for drop-in replacement. Key features include voice cloning through user-provided audio samples, a voice library for managing custom voices, and smart text processing for handling long inputs. The use of FastAPI ensures high performance and provides automatic API documentation.

Quick Start & Requirements

Installation: Clone the repository, install dependencies using uv sync (recommended) or pip install -r requirements.txt, and start the API with uv run main.py or uvicorn app.main:app --host 0.0.0.0 --port 4123. Docker is also recommended for deployment.
Prerequisites: Python 3.11+ is recommended. GPU support is recommended for performance.
Setup Time: Minimal for basic local setup; Docker deployment is also straightforward.
Documentation: Complete Streaming Examples & Documentation →, Status API Documentation

Highlighted Details

OpenAI-compatible API for seamless integration.
Voice cloning with support for custom voice samples (MP3, WAV, FLAC, M4A, OGG, max 10MB).
Real-time audio streaming via raw audio chunks or Server-Side Events (SSE).
Includes an optional React-based web UI for a full-stack experience.

Maintenance & Community

Discord: Join the Discord for community support.
Issues: Report bugs and feature requests via GitHub issues.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The license is not specified, which may impact commercial use. Voice cloning requires 10-30 seconds of clear speech with minimal background noise for best results. The README mentions potential CUDA/CPU compatibility issues if PyTorch is not correctly configured.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

3

Issues (30d)

4

Star History

51 stars in the last 30 days

Explore Similar Projects

whispering-ui by Sharrnah

Native UI for live audio transcription/translation

Created 2 years ago

Updated 1 day ago

Auralis by astramind-ai

TTS engine for fast voice cloning

Created 1 year ago

Updated 9 months ago

RuntimeSpeechRecognizer by gtreshchev

Unreal Engine plugin for real-time, offline speech recognition

Created 2 years ago

Updated 8 months ago

Dia-TTS-Server by devnen

Self-host a powerful TTS model with an OpenAI-compatible API

Created 6 months ago

Updated 5 months ago

ttsfm by dbccccccc

API server mirroring OpenAI's TTS service

Created 7 months ago

Updated 1 week ago

xtts-api-server by daswer123

FastAPI server for XTTSv2 text-to-speech

Created 1 year ago

Updated 1 year ago

Starred by

Abubakar Abid

Abubakar Abid(Cofounder of Gradio).

Chatterbox-TTS-Server by devnen

Self-host a powerful TTS server with a web UI and API

Created 5 months ago

Updated 3 months ago

Starred by

Laurent Mazare

Laurent Mazare(Cofounder of Kyutai).

unmute by kyutai-labs

LLM voice and speech interface

Created 4 months ago

Updated 1 day ago

Speech-AI-Forge by lenML

TTS API server and Gradio WebUI

Created 1 year ago

Updated 1 month ago

alltalk_tts by erew123

Text-to-speech tool based on Coqui TTS engine

Created 1 year ago

Updated 3 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

RealtimeVoiceChat by KoljaB

Real-time voice chat with AI using streaming audio

Created 6 months ago

Updated 3 months ago

neutts-air by neuphonic

On-device Text-to-Speech with instant voice cloning

Created 1 month ago

Updated 6 days ago

Feedback? Help us improve.