ai-voice-cloning  by JarodMica

AI voice cloning toolkit

created 1 year ago
780 stars

Top 45.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a fork of the Tortoise TTS AI voice cloning project, offering enhanced features like multi-language training, faster inference via Hifigan, Whisper-v3 integration, and RVC output conversion. It targets users interested in generating synthetic speech and cloning voices, providing a more accessible and updated alternative to the original archived project.

How It Works

The project leverages deep learning models for voice cloning, building upon the Tortoise TTS architecture. It incorporates Hifigan for accelerated inference, potentially trading some audio fidelity for speed. The integration of Whisper-v3 via whisperx allows for improved speech-to-text processing during training or inference. A key enhancement is the optional RVC (Retrieval-based Voice Conversion) integration, enabling further voice manipulation and style transfer using pre-trained RVC models.

Quick Start & Requirements

  • Windows: Download and extract the latest release package from Hugging Face. Run start.bat. Requires Python 3.11 and 7zip (optional but recommended).
  • Linux/WSL2: Requires Docker and NVIDIA Container Toolkit. Clone the repo, run ./setup-docker.sh to build, and ./start-docker.sh to run. Access via http://localhost:7860.
  • Dependencies: NVIDIA GPU with CUDA support is essential. Python 3.11 is required for manual installation. Additional models are downloaded on first use.

Highlighted Details

  • Supports training in multiple languages.
  • Includes Hifigan for faster inference.
  • Integrates Whisper-v3 for enhanced transcription.
  • Adds RVC output conversion for advanced voice manipulation.

Maintenance & Community

The original author has archived the project and does not plan further active maintenance. This fork aims to keep the repository functional. Users are encouraged to report bugs via the GitHub issues tab.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is marked as archived by its original author, with this fork intended primarily for maintenance rather than new feature development. Users are directed to newer, potentially superior alternatives like XTTS, GPT-SoVITS, and StyleTTS2. Manual installation requires careful attention to Python version and dependency management.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.