Discover and explore top open-source AI tools and projects—updated daily.
lenMLTTS API server and Gradio WebUI
Top 29.7% on SourcePulse
Speech-AI-Forge is a comprehensive toolkit for Text-to-Speech (TTS) generation, offering a robust API server and an interactive Gradio WebUI. It targets developers and researchers seeking to integrate advanced TTS capabilities into their applications, providing features like multi-model support, voice cloning, and SSML integration for fine-grained control over speech synthesis.
How It Works
The project acts as a unified inference framework, abstracting the complexities of various TTS models including ChatTTS, CosyVoice, FishSpeech, and others. It supports both streaming and sentence-level synthesis, with an emphasis on flexible voice management, including custom voice uploads, reference audio cloning, and a dedicated "Voice Builder" for creating new voice models. An integrated Automatic Speech Recognition (ASR) component leverages Whisper for speech-to-text tasks.
Quick Start & Requirements
python -m scripts.download_models --source huggingface is required before running.python webui.pypython launch.pyHighlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
--compile flag is not recommended due to potential performance issues with dynamic shapes.1 month ago
1 day
RVC-Boss