AI voice cloning toolkit
Top 45.7% on sourcepulse
This repository provides a fork of the Tortoise TTS AI voice cloning project, offering enhanced features like multi-language training, faster inference via Hifigan, Whisper-v3 integration, and RVC output conversion. It targets users interested in generating synthetic speech and cloning voices, providing a more accessible and updated alternative to the original archived project.
How It Works
The project leverages deep learning models for voice cloning, building upon the Tortoise TTS architecture. It incorporates Hifigan for accelerated inference, potentially trading some audio fidelity for speed. The integration of Whisper-v3 via whisperx allows for improved speech-to-text processing during training or inference. A key enhancement is the optional RVC (Retrieval-based Voice Conversion) integration, enabling further voice manipulation and style transfer using pre-trained RVC models.
Quick Start & Requirements
start.bat
. Requires Python 3.11 and 7zip (optional but recommended)../setup-docker.sh
to build, and ./start-docker.sh
to run. Access via http://localhost:7860
.Highlighted Details
Maintenance & Community
The original author has archived the project and does not plan further active maintenance. This fork aims to keep the repository functional. Users are encouraged to report bugs via the GitHub issues tab.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is marked as archived by its original author, with this fork intended primarily for maintenance rather than new feature development. Users are directed to newer, potentially superior alternatives like XTTS, GPT-SoVITS, and StyleTTS2. Manual installation requires careful attention to Python version and dependency management.
1 month ago
Inactive