Uni-TTS by X-T-E-R

Unified TTS toolkit for diverse speech synthesis engines

Created 2 years ago

767 stars

Top 45.5% on SourcePulse

Project Summary

This project aims to unify the usage of various speech synthesis engines by providing a common interface and supporting multiple adapters. It targets developers and researchers working with TTS systems who need a flexible and adaptable framework for integrating different synthesis models. The primary benefit is simplified integration and experimentation with diverse TTS technologies.

How It Works

The core of Uni-TTS is a framework designed for modularity, featuring a versatile parameter parser (with alias support and backward compatibility) and a Microsoft-request-like parser. It implements adapters for different TTS engines, with the GPT-soVITS adapter currently realized. The system supports FastAPI for returning synthesized audio as files or streams, facilitating backend service deployment.

Quick Start & Requirements

Install: Not specified, but likely involves cloning the repository and installing Python dependencies.
Prerequisites: Python, FastAPI, Gradio, Faster-Whisper, FunASR, FFmpeg. Specific model requirements (e.g., GPT-soVITS, Chinese Speech Pretrain, Chinese-Roberta-WWM-Ext-Large) and their associated dependencies (e.g., CUDA for GPU acceleration) are implied.
Resources: Likely requires significant disk space for pretrained models and potentially GPU resources for efficient inference.
Links:
- GPT-soVITS Inference: https://github.com/X-T-E-R/GPT-SoVITS-Inference

Highlighted Details

Unified interface for multiple TTS engines.
Includes a GPT-soVITS adapter.
Supports FastAPI for file and stream output.
Integrates tools like Gradio, Faster-Whisper, and FFmpeg.

Maintenance & Community

The project acknowledges contributions from the GPT-soVITS project and lists several other related projects and pretrained models it leverages or is inspired by. No specific community channels or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the heavy reliance on and code inclusion from other projects (e.g., GPT-soVITS, GSVI), compatibility with their respective licenses is a significant consideration. Commercial use may be restricted depending on the licenses of the incorporated components.

Limitations & Caveats

The README explicitly states "Please do not use this project for now" and indicates that documentation is pending updates. The project is presented as under active development, with only the framework, parser, GPT-soVITS adapter, and FastAPI return types currently implemented.

Uni-TTS by X-T-E-R

Explore Similar Projects

insanely-fast-whisper-cli by ochen1

assem-vc by maum-ai

gptsovits-api by jianchang512

ai-devices by developersdigest

LunaVox by Lux-Luna

Auralis by astramind-ai

Chatterbox-TTS-Extended by petermg

vits-simple-api by Artrajz

QuickAgent by gkamradt

alltalk_tts by erew123

speech-to-speech by huggingface

CosyVoice by FunAudioLLM