TTS-WebUI by rsxdalv

WebUI for local TTS and audio generation

Created 2 years ago

2,882 stars

Top 16.4% on SourcePulse

1 Expert Loves This Project

kmfreyberg

Keenan Freyberg

Cofounder of Suno

Project Summary

This project provides a unified Gradio and React web UI for numerous cutting-edge text-to-speech (TTS) and audio generation models, targeting researchers and power users seeking a consolidated platform for AI-driven audio synthesis. It aims to simplify the integration and experimentation with a wide array of advanced models, offering a single interface for diverse audio generation tasks.

How It Works

The project leverages a modular architecture, integrating various TTS and audio generation models as extensions. It utilizes Gradio for the backend API and React for the frontend, providing a dynamic and interactive user experience. This approach allows for easy addition and management of new models, facilitating rapid experimentation and comparison across different synthesis techniques.

Quick Start & Requirements

Install: Download the latest version and run start_tts_webui.bat (Windows) or start_tts_webui.sh (macOS, Linux). The script sets up conda and Python virtual environments.
Prerequisites: NVIDIA GPU with CUDA 12.4 recommended for optimal performance. Docker is also supported.
Resources: Initial model downloads can be substantial.
Links: Installation Guide, Docker Setup, Google Colab.

Highlighted Details

Supports over 20 TTS and audio generation models including Bark, Tortoise, XTTSv2, StyleTTS2, MusicGen, and RVC.
Features an extension manager for easy model integration and management.
Offers both Gradio and React UIs for a comprehensive user experience.
Includes a Docker image for simplified deployment and reproducibility.

Maintenance & Community

Active development with frequent updates, including recent additions of OpenVoice V2, DIA, Kokoro TTS, CosyVoice, and GPT-SoVITS extensions.
Discord community available for support and discussion: https://discord.gg/V8BKTVRtJ9.

Licensing & Compatibility

Core codebase is MIT licensed.
Dependencies have varying licenses; some, like encodec and diffq, are CC BY-NC 4.0, restricting commercial use. lameenc and unidecode are GPL.
Model weights also have different licenses; users must verify compatibility for their intended use.

Limitations & Caveats

Some dependencies (e.g., encodec, diffq) have non-commercial licenses (CC BY-NC 4.0), potentially limiting commercial application.
Audiocraft extensions are currently Linux/Windows only; macOS support may require manual installation.
The project combines many AI models, leading to potential dependency conflicts and warning messages during installation, which are noted as expected.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

2

Issues (30d)

5

Star History

79 stars in the last 30 days

Explore Similar Projects

awesome-audio-plaza by metame-ai

Curated list of audio research papers, projects, and resources

Created 1 year ago

Updated 2 months ago

whispering-ui by Sharrnah

Native UI for live audio transcription/translation

Created 3 years ago

Updated 1 week ago

WavJourney by Audio-AGI

Audio creation pipeline using LLMs for compositional generation

Created 2 years ago

Updated 2 years ago

awesome-large-audio-models by EmulationAI

Curated list of Large Language Models in Audio AI

Created 2 years ago

Updated 2 months ago

lobe-tts by lobehub

TTS/STT library for server and browser apps

Created 2 years ago

Updated 2 weeks ago

Starred by

Abubakar Abid

Abubakar Abid(Cofounder of Gradio),

Tim J. Baek

Tim J. Baek(Founder of Open WebUI), and

2 more.

OuteTTS by edwko

TTS interface for unified text-to-speech, treating audio as language

Created 1 year ago

Updated 6 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

FunMusic by FunAudioLLM

Toolkit for music, song, and audio generation

Created 1 year ago

Updated 7 months ago

XR3Player by goxr3plus

JavaFX media player/organizer

Created 9 years ago

Updated 8 months ago

Starred by

Mati Staniszewski

Mati Staniszewski(Cofounder of ElevenLabs).

elevenlabs-mcp by elevenlabs

MCP server for ElevenLabs TTS and audio processing APIs

Created 10 months ago

Updated 5 days ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

audiolm-pytorch by lucidrains

PyTorch implementation of Google's AudioLM for audio generation

Created 3 years ago

Updated 1 year ago

Starred by

Christian Laforte

Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

1 more.

Amphion by open-mmlab

Toolkit for audio, music, and speech generation research

Created 2 years ago

Updated 7 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Jiayi Pan

Jiayi Pan(Author of SWE-Gym; MTS at xAI), and

18 more.

audiocraft by facebookresearch

PyTorch library for audio processing and generation research

Created 2 years ago

Updated 10 months ago

Feedback? Help us improve.