chatterbox-tts-api  by travisvn

OpenAI-compatible TTS API with voice cloning

Created 3 months ago
289 stars

Top 91.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a local, OpenAI-compatible Text-to-Speech (TTS) API powered by FastAPI and Chatterbox TTS. It enables voice cloning and offers features like a React frontend, extensive configuration, and real-time status monitoring. The primary benefit is providing a self-hosted, high-quality TTS solution that seamlessly integrates with existing OpenAI API-compatible applications, offering greater control and privacy.

How It Works

The API leverages the Chatterbox TTS model to generate speech from text. It exposes endpoints that mimic the OpenAI TTS API, allowing for drop-in replacement. Key features include voice cloning through user-provided audio samples, a voice library for managing custom voices, and smart text processing for handling long inputs. The use of FastAPI ensures high performance and provides automatic API documentation.

Quick Start & Requirements

  • Installation: Clone the repository, install dependencies using uv sync (recommended) or pip install -r requirements.txt, and start the API with uv run main.py or uvicorn app.main:app --host 0.0.0.0 --port 4123. Docker is also recommended for deployment.
  • Prerequisites: Python 3.11+ is recommended. GPU support is recommended for performance.
  • Setup Time: Minimal for basic local setup; Docker deployment is also straightforward.
  • Documentation: Complete Streaming Examples & Documentation →, Status API Documentation

Highlighted Details

  • OpenAI-compatible API for seamless integration.
  • Voice cloning with support for custom voice samples (MP3, WAV, FLAC, M4A, OGG, max 10MB).
  • Real-time audio streaming via raw audio chunks or Server-Side Events (SSE).
  • Includes an optional React-based web UI for a full-stack experience.

Maintenance & Community

  • Discord: Join the Discord for community support.
  • Issues: Report bugs and feature requests via GitHub issues.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The license is not specified, which may impact commercial use. Voice cloning requires 10-30 seconds of clear speech with minimal background noise for best results. The README mentions potential CUDA/CPU compatibility issues if PyTorch is not correctly configured.
Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
8
Star History
61 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.