xtts-webui  by daswer123

WebUI for XTTS, a text-to-speech model, and fine-tuning

created 1 year ago
834 stars

Top 43.6% on sourcepulse

GitHubView on GitHub
Project Summary

XTTS-WebUI provides a user-friendly web interface for the XTTS speech synthesis model, targeting users who want to generate high-quality speech, clone voices, and perform audio tasks. It offers batch processing, translation with voice saving, and integration with other AI voice tools like RVC and OpenVoice, simplifying complex audio manipulation for content creators and developers.

How It Works

The web UI leverages XTTSv2 for speech synthesis and integrates additional neural networks and audio tools for enhanced output quality. It supports batch processing for multiple files and allows for voice cloning and translation. Users can fine-tune XTTS models directly within the interface, enabling the creation of custom, high-quality voice models. The architecture allows for modular integration of tools like RVC, OpenVoice, and Resemble Enhance, offering flexibility in audio post-processing.

Quick Start & Requirements

  • Installation: Run install.bat (Windows) or install.sh (Linux), then start_xtts_webui.bat/.sh.
  • Prerequisites: Python 3.10.x or 3.11, CUDA 11.8 or 12.1, Microsoft Build Tools 2019 (with C++ package), ffmpeg.
  • Hardware: NVIDIA GPU with 6GB VRAM recommended for portable version.
  • Documentation: https://github.com/daswer123/xtts-webui

Highlighted Details

  • Supports batch processing for dubbing large numbers of files.
  • Integrates with RVC, OpenVoice, and Resemble Enhance for advanced audio manipulation.
  • Allows customization of XTTS generation parameters and speaker samples.
  • Offers fine-tuning capabilities for creating custom voice models.

Maintenance & Community

The project is actively maintained. Further community links or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The "Train" tab is noted as broken, with users directed to a separate xtts-finetune-webui for training. The portable version is Windows-only.

Health Check
Last commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
49 stars in the last 90 days

Explore Similar Projects

Starred by Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

ultravox by fixie-ai

0.4%
4k
Multimodal LLM for real-time voice interactions
created 1 year ago
updated 4 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

AudioGPT by AIGC-Audio

0.1%
10k
Audio processing and generation research project
created 2 years ago
updated 1 year ago
Feedback? Help us improve.