TTS-WebUI  by rsxdalv

WebUI for local TTS and audio generation

created 2 years ago
2,396 stars

Top 19.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a unified Gradio and React web UI for numerous cutting-edge text-to-speech (TTS) and audio generation models, targeting researchers and power users seeking a consolidated platform for AI-driven audio synthesis. It aims to simplify the integration and experimentation with a wide array of advanced models, offering a single interface for diverse audio generation tasks.

How It Works

The project leverages a modular architecture, integrating various TTS and audio generation models as extensions. It utilizes Gradio for the backend API and React for the frontend, providing a dynamic and interactive user experience. This approach allows for easy addition and management of new models, facilitating rapid experimentation and comparison across different synthesis techniques.

Quick Start & Requirements

  • Install: Download the latest version and run start_tts_webui.bat (Windows) or start_tts_webui.sh (macOS, Linux). The script sets up conda and Python virtual environments.
  • Prerequisites: NVIDIA GPU with CUDA 12.4 recommended for optimal performance. Docker is also supported.
  • Resources: Initial model downloads can be substantial.
  • Links: Installation Guide, Docker Setup, Google Colab.

Highlighted Details

  • Supports over 20 TTS and audio generation models including Bark, Tortoise, XTTSv2, StyleTTS2, MusicGen, and RVC.
  • Features an extension manager for easy model integration and management.
  • Offers both Gradio and React UIs for a comprehensive user experience.
  • Includes a Docker image for simplified deployment and reproducibility.

Maintenance & Community

  • Active development with frequent updates, including recent additions of OpenVoice V2, DIA, Kokoro TTS, CosyVoice, and GPT-SoVITS extensions.
  • Discord community available for support and discussion: https://discord.gg/V8BKTVRtJ9.

Licensing & Compatibility

  • Core codebase is MIT licensed.
  • Dependencies have varying licenses; some, like encodec and diffq, are CC BY-NC 4.0, restricting commercial use. lameenc and unidecode are GPL.
  • Model weights also have different licenses; users must verify compatibility for their intended use.

Limitations & Caveats

  • Some dependencies (e.g., encodec, diffq) have non-commercial licenses (CC BY-NC 4.0), potentially limiting commercial application.
  • Audiocraft extensions are currently Linux/Windows only; macOS support may require manual installation.
  • The project combines many AI models, leading to potential dependency conflicts and warning messages during installation, which are noted as expected.
Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
7
Star History
286 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Andre Zayarni Andre Zayarni(Cofounder of Qdrant), and
2 more.

RealChar by Shaunwei

0.1%
6k
Real-time AI character/companion creation and interaction codebase
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
1 more.

SillyTavern by SillyTavern

3.2%
17k
LLM frontend for power users
created 2 years ago
updated 3 days ago
Feedback? Help us improve.