MockingBird  by babysor

Voice cloning tool for generating arbitrary speech

Created 4 years ago
36,647 stars

Top 0.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a voice cloning tool that allows users to clone a voice in approximately 5 seconds and generate arbitrary speech content in real-time. It is targeted at researchers and developers interested in speech synthesis and voice manipulation, offering a PyTorch-based implementation with support for Chinese languages and multiple datasets.

How It Works

The system leverages a multi-stage approach, likely based on the Real-Time-Voice-Cloning architecture. It involves an encoder to capture speaker embeddings, a synthesizer to generate mel-spectrograms from text and speaker embeddings, and a vocoder to convert mel-spectrograms into audible speech waveforms. The advantage lies in its ability to reuse pre-trained encoders and vocoders, allowing for rapid synthesis with a newly trained synthesizer.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt or use conda/mamba with env.yml.
  • Prerequisites: Python 3.7+, PyTorch 1.9.0 (with CUDA 10.2 recommended), ffmpeg.
  • M1 Mac Setup: Requires Rosetta, specific PyQt5 installation, manual path configuration for Python.h, and compiling pyworld and ctc-segmentation from source with x86 architecture emulation.
  • Models: Pretrained encoder/vocoder are used, but a synthesizer model compatible with Chinese symbols is required (either trained or provided).
  • Links: Noiz.ai (cloud hosted version), DEMO VIDEO

Highlighted Details

  • Supports Chinese language synthesis with multiple datasets (aidatatang_200zh, magicdata, aishell3, data_aishell).
  • Offers a web server for remote calling and a command-line interface.
  • Provides guidance on training custom encoder, synthesizer, and vocoder models.
  • Includes troubleshooting tips for VRAM limitations, page file size, and model loading errors.

Maintenance & Community

The repository is no longer actively updated by the original author, who is focusing on a commercialized version at noiz.ai. Community contributions may exist for specific issues or model sharing.

Licensing & Compatibility

The README does not explicitly state a license. The project is a fork of Real-Time-Voice-Cloning, which is typically under permissive licenses like MIT. However, the absence of a clear license in this fork requires verification for commercial use or closed-source linking.

Limitations & Caveats

The project is not actively maintained, and the original demo_toolbox.py may not work with newer PyTorch versions or on M1 Macs without significant workarounds. Training custom models requires substantial computational resources and dataset preparation. Compatibility with specific pre-trained models is version-dependent.

Health Check
Last Commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
128 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
6 more.

OpenVoice by myshell-ai

0.2%
34k
Audio foundation model for versatile, instant voice cloning
Created 1 year ago
Updated 5 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.