TTS API server and Gradio WebUI
Top 31.2% on sourcepulse
Speech-AI-Forge is a comprehensive toolkit for Text-to-Speech (TTS) generation, offering a robust API server and an interactive Gradio WebUI. It targets developers and researchers seeking to integrate advanced TTS capabilities into their applications, providing features like multi-model support, voice cloning, and SSML integration for fine-grained control over speech synthesis.
How It Works
The project acts as a unified inference framework, abstracting the complexities of various TTS models including ChatTTS, CosyVoice, FishSpeech, and others. It supports both streaming and sentence-level synthesis, with an emphasis on flexible voice management, including custom voice uploads, reference audio cloning, and a dedicated "Voice Builder" for creating new voice models. An integrated Automatic Speech Recognition (ASR) component leverages Whisper for speech-to-text tasks.
Quick Start & Requirements
python -m scripts.download_models --source huggingface
is required before running.python webui.py
python launch.py
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
--compile
flag is not recommended due to potential performance issues with dynamic shapes.1 week ago
1 day