ai-voice-cloning by JarodMica

AI voice cloning toolkit

Created 2 years ago

785 stars

Top 44.8% on SourcePulse

Project Summary

This repository provides a fork of the Tortoise TTS AI voice cloning project, offering enhanced features like multi-language training, faster inference via Hifigan, Whisper-v3 integration, and RVC output conversion. It targets users interested in generating synthetic speech and cloning voices, providing a more accessible and updated alternative to the original archived project.

How It Works

The project leverages deep learning models for voice cloning, building upon the Tortoise TTS architecture. It incorporates Hifigan for accelerated inference, potentially trading some audio fidelity for speed. The integration of Whisper-v3 via whisperx allows for improved speech-to-text processing during training or inference. A key enhancement is the optional RVC (Retrieval-based Voice Conversion) integration, enabling further voice manipulation and style transfer using pre-trained RVC models.

Quick Start & Requirements

Windows: Download and extract the latest release package from Hugging Face. Run start.bat. Requires Python 3.11 and 7zip (optional but recommended).
Linux/WSL2: Requires Docker and NVIDIA Container Toolkit. Clone the repo, run ./setup-docker.sh to build, and ./start-docker.sh to run. Access via http://localhost:7860.
Dependencies: NVIDIA GPU with CUDA support is essential. Python 3.11 is required for manual installation. Additional models are downloaded on first use.

Highlighted Details

Supports training in multiple languages.
Includes Hifigan for faster inference.
Integrates Whisper-v3 for enhanced transcription.
Adds RVC output conversion for advanced voice manipulation.

Maintenance & Community

The original author has archived the project and does not plan further active maintenance. This fork aims to keep the repository functional. Users are encouraged to report bugs via the GitHub issues tab.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is marked as archived by its original author, with this fork intended primarily for maintenance rather than new feature development. Users are directed to newer, potentially superior alternatives like XTTS, GPT-SoVITS, and StyleTTS2. Manual installation requires careful attention to Python version and dependency management.

ai-voice-cloning by JarodMica

Explore Similar Projects

easevoice-trainer by megaease

MahaTTS by dubverse-ai

cosyvoice-api by jianchang512

Pandrator by lukaszliniewicz

ollama-voice-mac by apeatling

xtts2-ui by BoltzmannEntropy

sesame_csm_openai by phildougherty

whisper-ctranslate2 by Softcatala

ComfyUI-Qwen-TTS by flybirdxx

WhisperSpeech by WhisperSpeech

VITS-fast-fine-tuning by Plachtaa

MockingBird by babysor