Real-Time-Voice-Cloning by CorentinJ

Voice cloning for real-time speech generation

Created 6 years ago

59,184 stars

Top 0.4% on SourcePulse

View on GitHub

15 Experts Love This Project

Didier Lopes

Founder of OpenBB

Carol Willing

Core Contributor to CPython, Jupyter

Tim J. Baek

Founder of Open WebUI

Christian Laforte

Distinguished Engineer at NVIDIA; Former CTO at Stability AI

and 11 more!

Project Summary

This repository implements a real-time voice cloning system using a three-stage deep learning framework (SV2TTS), enabling users to generate arbitrary speech from a few seconds of audio. It's primarily for researchers and developers interested in TTS and voice synthesis, offering an open-source solution for voice cloning.

How It Works

The SV2TTS framework leverages transfer learning. The first stage uses a generalized end-to-end (GE2E) loss for speaker verification to create a compact voice embedding from a short audio sample. This embedding then conditions a Tacotron-based synthesizer and a WaveRNN vocoder in the subsequent stages to generate speech in real-time, aiming for efficient and rapid voice replication.

Quick Start & Requirements

Install via pip install -r requirements.txt after installing PyTorch (CPU or GPU/CUDA) and ffmpeg.
Recommended: Python 3.7+, GPU for inference speed.
Pretrained models are downloaded automatically.
Test configuration with python demo_cli.py.
Launch the toolbox with python demo_toolbox.py.
Official Docs: https://github.com/CorentinJ/Real-Time-Voice-Cloning

Highlighted Details

Implements SV2TTS, WaveRNN, and GE2E loss papers.
Generates speech in real-time.
Requires minimal audio input for cloning.
Supports Windows and Linux.

Maintenance & Community

The project is a master's thesis implementation and notes that it has "quickly gotten old." It suggests checking out CoquiTTS or MetaVoice-1B for higher quality open-source solutions. No specific community links (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The project acknowledges that its audio quality may be surpassed by modern SaaS solutions and other open-source projects like CoquiTTS. It also notes potential setup issues related to dependencies and X-server errors during toolbox execution.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

272 stars in the last 30 days