xtts-webui by daswer123

WebUI for XTTS, a text-to-speech model, and fine-tuning

Created 2 years ago

886 stars

Top 40.2% on SourcePulse

Project Summary

XTTS-WebUI provides a user-friendly web interface for the XTTS speech synthesis model, targeting users who want to generate high-quality speech, clone voices, and perform audio tasks. It offers batch processing, translation with voice saving, and integration with other AI voice tools like RVC and OpenVoice, simplifying complex audio manipulation for content creators and developers.

How It Works

The web UI leverages XTTSv2 for speech synthesis and integrates additional neural networks and audio tools for enhanced output quality. It supports batch processing for multiple files and allows for voice cloning and translation. Users can fine-tune XTTS models directly within the interface, enabling the creation of custom, high-quality voice models. The architecture allows for modular integration of tools like RVC, OpenVoice, and Resemble Enhance, offering flexibility in audio post-processing.

Quick Start & Requirements

Installation: Run install.bat (Windows) or install.sh (Linux), then start_xtts_webui.bat/.sh.
Prerequisites: Python 3.10.x or 3.11, CUDA 11.8 or 12.1, Microsoft Build Tools 2019 (with C++ package), ffmpeg.
Hardware: NVIDIA GPU with 6GB VRAM recommended for portable version.
Documentation: https://github.com/daswer123/xtts-webui

Highlighted Details

Supports batch processing for dubbing large numbers of files.
Integrates with RVC, OpenVoice, and Resemble Enhance for advanced audio manipulation.
Allows customization of XTTS generation parameters and speaker samples.
Offers fine-tuning capabilities for creating custom voice models.

Maintenance & Community

The project is actively maintained. Further community links or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The "Train" tab is noted as broken, with users directed to a separate xtts-finetune-webui for training. The portable version is Windows-only.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

4 stars in the last 30 days

Explore Similar Projects

open-dubbing by Softcatala

AI dubbing system for videos

Created 1 year ago

Updated 11 months ago

Starred by

Johannes Schickling

Johannes Schickling(Cofounder of Prisma) and

Jonathan Ragan-Kelley

Jonathan Ragan-Kelley(Professor at MIT).

sag by steipete

Modern TTS CLI inspired by macOS say

Created 6 months ago

Updated 2 days ago

xtts2-ui by BoltzmannEntropy

UI for text-based voice cloning using a 10-second audio sample

Created 2 years ago

Updated 1 year ago

Faster-Whisper-TransWithAI-ChickenRice by TransWithAI

Optimized Japanese-to-Chinese audio/video transcription and translation

Created 7 months ago

Updated 2 days ago

openedai-speech by matatonic

OpenAI API-compatible server for text-to-speech

Created 2 years ago

Updated 1 year ago

vits-simple-api by Artrajz

HTTP API for VITS-based text-to-speech and voice conversion

Created 3 years ago

Updated 3 weeks ago

ComfyUI-Qwen-TTS by flybirdxx

Advanced ComfyUI nodes for speech synthesis and voice AI

Created 4 months ago

Updated 1 week ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind),

Jonathan Ragan-Kelley

Jonathan Ragan-Kelley(Professor at MIT), and

3 more.

WhisperSpeech by WhisperSpeech

Open-source text-to-speech system built by inverting Whisper

Created 3 years ago

Updated 6 months ago

alltalk_tts by erew123

Text-to-speech tool based on Coqui TTS engine

Created 2 years ago

Updated 5 months ago

easyVoice by cosin2077

Text-to-speech tool for long texts and multi-character dubbing

Created 1 year ago

Updated 4 months ago

Starred by

Abubakar Abid

Abubakar Abid(Cofounder of Gradio).

voice-pro by abus-aikorea

WebUI for speech recognition, translation, and dubbing

Created 1 year ago

Updated 6 months ago

CosyVoice by FunAudioLLM

Voice generation model for inference, training, and deployment

Created 1 year ago

Updated 2 weeks ago

Feedback? Help us improve.