MockingBird by babysor

Voice cloning tool for generating arbitrary speech

Created 4 years ago

36,848 stars

Top 0.9% on SourcePulse

View on GitHub

5 Experts Love This Project

Binyuan Hui

Research Scientist at Alibaba Qwen

Samuel Colvin

Founder and Author of Pydantic

Jiaming Song

Chief Scientist at Luma AI

Travis Fischer

Founder of Agentic

and 1 more!

Project Summary

This project provides a voice cloning tool that allows users to clone a voice in approximately 5 seconds and generate arbitrary speech content in real-time. It is targeted at researchers and developers interested in speech synthesis and voice manipulation, offering a PyTorch-based implementation with support for Chinese languages and multiple datasets.

How It Works

The system leverages a multi-stage approach, likely based on the Real-Time-Voice-Cloning architecture. It involves an encoder to capture speaker embeddings, a synthesizer to generate mel-spectrograms from text and speaker embeddings, and a vocoder to convert mel-spectrograms into audible speech waveforms. The advantage lies in its ability to reuse pre-trained encoders and vocoders, allowing for rapid synthesis with a newly trained synthesizer.

Quick Start & Requirements

Installation: pip install -r requirements.txt or use conda/mamba with env.yml.
Prerequisites: Python 3.7+, PyTorch 1.9.0 (with CUDA 10.2 recommended), ffmpeg.
M1 Mac Setup: Requires Rosetta, specific PyQt5 installation, manual path configuration for Python.h, and compiling pyworld and ctc-segmentation from source with x86 architecture emulation.
Models: Pretrained encoder/vocoder are used, but a synthesizer model compatible with Chinese symbols is required (either trained or provided).
Links: Noiz.ai (cloud hosted version), DEMO VIDEO

Highlighted Details

Supports Chinese language synthesis with multiple datasets (aidatatang_200zh, magicdata, aishell3, data_aishell).
Offers a web server for remote calling and a command-line interface.
Provides guidance on training custom encoder, synthesizer, and vocoder models.
Includes troubleshooting tips for VRAM limitations, page file size, and model loading errors.

Maintenance & Community

The repository is no longer actively updated by the original author, who is focusing on a commercialized version at noiz.ai. Community contributions may exist for specific issues or model sharing.

Licensing & Compatibility

The README does not explicitly state a license. The project is a fork of Real-Time-Voice-Cloning, which is typically under permissive licenses like MIT. However, the absence of a clear license in this fork requires verification for commercial use or closed-source linking.

Limitations & Caveats

The project is not actively maintained, and the original demo_toolbox.py may not work with newer PyTorch versions or on M1 Macs without significant workarounds. Training custom models requires substantial computational resources and dataset preparation. Compatibility with specific pre-trained models is version-dependent.

Health Check

Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

85 stars in the last 30 days