Speech-AI-Forge by lenML

TTS API server and Gradio WebUI

Created 1 year ago

1,383 stars

Top 28.8% on SourcePulse

Project Summary

Speech-AI-Forge is a comprehensive toolkit for Text-to-Speech (TTS) generation, offering a robust API server and an interactive Gradio WebUI. It targets developers and researchers seeking to integrate advanced TTS capabilities into their applications, providing features like multi-model support, voice cloning, and SSML integration for fine-grained control over speech synthesis.

How It Works

The project acts as a unified inference framework, abstracting the complexities of various TTS models including ChatTTS, CosyVoice, FishSpeech, and others. It supports both streaming and sentence-level synthesis, with an emphasis on flexible voice management, including custom voice uploads, reference audio cloning, and a dedicated "Voice Builder" for creating new voice models. An integrated Automatic Speech Recognition (ASR) component leverages Whisper for speech-to-text tasks.

Quick Start & Requirements

Installation: Manual model download via python -m scripts.download_models --source huggingface is required before running.
Prerequisites: Python, PyTorch. Specific models may have additional dependencies. GPU acceleration is highly recommended for performance.
Running:
- WebUI: python webui.py
- API Server: python launch.py
Documentation: Installation and Running

Highlighted Details

Supports multiple TTS models (ChatTTS, CosyVoice, FishSpeech, F5-TTS, etc.) and ASR (Whisper).
Features advanced voice cloning via reference audio and a "Voice Builder" for custom voice creation.
Includes SSML support for detailed control over speech synthesis, with a dedicated script editor.
Offers a voice enhancer and post-processing tools for optimizing audio output.

Maintenance & Community

Active development with ongoing feature additions and model integrations.
Community support available via Discord Server.

Licensing & Compatibility

The project itself appears to be under a permissive license, but individual model licenses should be checked for compatibility. The README does not explicitly state a project-wide license.

Limitations & Caveats

Model download is a manual process.
Some features, like the SenseVoice ASR and GPT-SoVITS TTS, are marked as "in development" (🚧).
The --compile flag is not recommended due to potential performance issues with dynamic shapes.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

5

Star History

5 stars in the last 30 days

Explore Similar Projects

Voice-Clone-Studio by FranckyB

Gradio web UI for advanced voice cloning and design

Created 1 month ago

Updated 1 day ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

S.A.T.U.R.D.A.Y by GRVYDEV

Vocal computing toolbox for building voice interfaces to LLMs

Created 2 years ago

Updated 2 years ago

gpt-voice-conversation-chatbot by Adri6336

Voice chatbot for engaging spoken conversations with ChatGPT/GPT-4

Created 3 years ago

Updated 1 year ago

SonicVale by xcLee001

AI voice generation platform for diverse content

Created 5 months ago

Updated 2 weeks ago

xtts2-ui by BoltzmannEntropy

UI for text-based voice cloning using a 10-second audio sample

Created 2 years ago

Updated 1 year ago

sesame_csm_openai by phildougherty

OpenAI-compatible TTS API for voice cloning

Created 11 months ago

Updated 5 months ago

openedai-speech by matatonic

OpenAI API-compatible server for text-to-speech

Created 2 years ago

Updated 1 year ago

tts by zuoban

TTS service for voice synthesis using Microsoft Azure

Created 1 year ago

Updated 3 weeks ago

alltalk_tts by erew123

Text-to-speech tool based on Coqui TTS engine

Created 2 years ago

Updated 1 month ago

easyVoice by cosin2077

Text-to-speech tool for long texts and multi-character dubbing

Created 11 months ago

Updated 1 month ago

Starred by

Abubakar Abid

Abubakar Abid(Cofounder of Gradio).

voice-pro by abus-aikorea

WebUI for speech recognition, translation, and dubbing

Created 1 year ago

Updated 2 months ago

Starred by

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm) and

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

Few-shot voice cloning and TTS web UI

Created 2 years ago

Updated 2 weeks ago

Feedback? Help us improve.