Discover and explore top open-source AI tools and projects—updated daily.
Self-host a powerful TTS server with a web UI and API
Top 60.2% on SourcePulse
This project provides a self-hostable server for the Chatterbox TTS model, offering a user-friendly web UI and an OpenAI-compatible API. It targets developers and users needing to generate high-quality speech, perform voice cloning, and process large text volumes for applications like audiobook creation, with accelerated performance on NVIDIA, AMD, and Apple Silicon hardware.
How It Works
The server leverages the Chatterbox TTS engine, enhanced with a FastAPI backend for robust API and UI functionality. It intelligently chunks long text inputs based on sentence structure for seamless audio concatenation, supports voice cloning via reference audio, and offers predefined voices for consistent output. Generation consistency is further improved by an optional seed parameter.
Quick Start & Requirements
pip install -r requirements-nvidia.txt
(for NVIDIA), requirements-rocm.txt
(for AMD), or requirements.txt
(for CPU). Apple Silicon requires a specific multi-step installation.libsndfile1
and ffmpeg
./docs
after server startup.Highlighted Details
/v1/audio/speech
endpoint.Maintenance & Community
The project is actively maintained by devnen. Community interaction and contributions are encouraged via GitHub issues and pull requests.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
AMD ROCm support is limited to Linux. While the server offers extensive troubleshooting, specific ROCm compatibility issues or older AMD GPU architectures might require manual configuration overrides. The "UI Cancel" button stops the frontend waiting but does not immediately halt backend inference.
2 months ago
Inactive