TTS-WebUI  by rsxdalv

WebUI for local TTS and audio generation

Created 2 years ago
2,549 stars

Top 18.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a unified Gradio and React web UI for numerous cutting-edge text-to-speech (TTS) and audio generation models, targeting researchers and power users seeking a consolidated platform for AI-driven audio synthesis. It aims to simplify the integration and experimentation with a wide array of advanced models, offering a single interface for diverse audio generation tasks.

How It Works

The project leverages a modular architecture, integrating various TTS and audio generation models as extensions. It utilizes Gradio for the backend API and React for the frontend, providing a dynamic and interactive user experience. This approach allows for easy addition and management of new models, facilitating rapid experimentation and comparison across different synthesis techniques.

Quick Start & Requirements

  • Install: Download the latest version and run start_tts_webui.bat (Windows) or start_tts_webui.sh (macOS, Linux). The script sets up conda and Python virtual environments.
  • Prerequisites: NVIDIA GPU with CUDA 12.4 recommended for optimal performance. Docker is also supported.
  • Resources: Initial model downloads can be substantial.
  • Links: Installation Guide, Docker Setup, Google Colab.

Highlighted Details

  • Supports over 20 TTS and audio generation models including Bark, Tortoise, XTTSv2, StyleTTS2, MusicGen, and RVC.
  • Features an extension manager for easy model integration and management.
  • Offers both Gradio and React UIs for a comprehensive user experience.
  • Includes a Docker image for simplified deployment and reproducibility.

Maintenance & Community

  • Active development with frequent updates, including recent additions of OpenVoice V2, DIA, Kokoro TTS, CosyVoice, and GPT-SoVITS extensions.
  • Discord community available for support and discussion: https://discord.gg/V8BKTVRtJ9.

Licensing & Compatibility

  • Core codebase is MIT licensed.
  • Dependencies have varying licenses; some, like encodec and diffq, are CC BY-NC 4.0, restricting commercial use. lameenc and unidecode are GPL.
  • Model weights also have different licenses; users must verify compatibility for their intended use.

Limitations & Caveats

  • Some dependencies (e.g., encodec, diffq) have non-commercial licenses (CC BY-NC 4.0), potentially limiting commercial application.
  • Audiocraft extensions are currently Linux/Windows only; macOS support may require manual installation.
  • The project combines many AI models, leading to potential dependency conflicts and warning messages during installation, which are noted as expected.
Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
9
Issues (30d)
12
Star History
111 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Feedback? Help us improve.