Real-Time-Voice-Cloning  by CorentinJ

Voice cloning for real-time speech generation

created 6 years ago
54,781 stars

Top 0.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository implements a real-time voice cloning system using a three-stage deep learning framework (SV2TTS), enabling users to generate arbitrary speech from a few seconds of audio. It's primarily for researchers and developers interested in TTS and voice synthesis, offering an open-source solution for voice cloning.

How It Works

The SV2TTS framework leverages transfer learning. The first stage uses a generalized end-to-end (GE2E) loss for speaker verification to create a compact voice embedding from a short audio sample. This embedding then conditions a Tacotron-based synthesizer and a WaveRNN vocoder in the subsequent stages to generate speech in real-time, aiming for efficient and rapid voice replication.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after installing PyTorch (CPU or GPU/CUDA) and ffmpeg.
  • Recommended: Python 3.7+, GPU for inference speed.
  • Pretrained models are downloaded automatically.
  • Test configuration with python demo_cli.py.
  • Launch the toolbox with python demo_toolbox.py.
  • Official Docs: https://github.com/CorentinJ/Real-Time-Voice-Cloning

Highlighted Details

  • Implements SV2TTS, WaveRNN, and GE2E loss papers.
  • Generates speech in real-time.
  • Requires minimal audio input for cloning.
  • Supports Windows and Linux.

Maintenance & Community

The project is a master's thesis implementation and notes that it has "quickly gotten old." It suggests checking out CoquiTTS or MetaVoice-1B for higher quality open-source solutions. No specific community links (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The project acknowledges that its audio quality may be surpassed by modern SaaS solutions and other open-source projects like CoquiTTS. It also notes potential setup issues related to dependencies and X-server errors during toolbox execution.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
834 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.