Real-Time-Voice-Cloning  by CorentinJ

Voice cloning for real-time speech generation

Created 6 years ago
56,281 stars

Top 0.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository implements a real-time voice cloning system using a three-stage deep learning framework (SV2TTS), enabling users to generate arbitrary speech from a few seconds of audio. It's primarily for researchers and developers interested in TTS and voice synthesis, offering an open-source solution for voice cloning.

How It Works

The SV2TTS framework leverages transfer learning. The first stage uses a generalized end-to-end (GE2E) loss for speaker verification to create a compact voice embedding from a short audio sample. This embedding then conditions a Tacotron-based synthesizer and a WaveRNN vocoder in the subsequent stages to generate speech in real-time, aiming for efficient and rapid voice replication.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after installing PyTorch (CPU or GPU/CUDA) and ffmpeg.
  • Recommended: Python 3.7+, GPU for inference speed.
  • Pretrained models are downloaded automatically.
  • Test configuration with python demo_cli.py.
  • Launch the toolbox with python demo_toolbox.py.
  • Official Docs: https://github.com/CorentinJ/Real-Time-Voice-Cloning

Highlighted Details

  • Implements SV2TTS, WaveRNN, and GE2E loss papers.
  • Generates speech in real-time.
  • Requires minimal audio input for cloning.
  • Supports Windows and Linux.

Maintenance & Community

The project is a master's thesis implementation and notes that it has "quickly gotten old." It suggests checking out CoquiTTS or MetaVoice-1B for higher quality open-source solutions. No specific community links (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The project acknowledges that its audio quality may be surpassed by modern SaaS solutions and other open-source projects like CoquiTTS. It also notes potential setup issues related to dependencies and X-server errors during toolbox execution.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
13
Issues (30d)
23
Star History
1,606 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
2 more.

AudioGPT by AIGC-Audio

0.0%
10k
Audio processing and generation research project
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
6 more.

OpenVoice by myshell-ai

0.2%
34k
Audio foundation model for versatile, instant voice cloning
Created 1 year ago
Updated 5 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.