MockingBird  by babysor

Voice cloning tool for generating arbitrary speech

created 4 years ago
36,501 stars

Top 0.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a voice cloning tool that allows users to clone a voice in approximately 5 seconds and generate arbitrary speech content in real-time. It is targeted at researchers and developers interested in speech synthesis and voice manipulation, offering a PyTorch-based implementation with support for Chinese languages and multiple datasets.

How It Works

The system leverages a multi-stage approach, likely based on the Real-Time-Voice-Cloning architecture. It involves an encoder to capture speaker embeddings, a synthesizer to generate mel-spectrograms from text and speaker embeddings, and a vocoder to convert mel-spectrograms into audible speech waveforms. The advantage lies in its ability to reuse pre-trained encoders and vocoders, allowing for rapid synthesis with a newly trained synthesizer.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt or use conda/mamba with env.yml.
  • Prerequisites: Python 3.7+, PyTorch 1.9.0 (with CUDA 10.2 recommended), ffmpeg.
  • M1 Mac Setup: Requires Rosetta, specific PyQt5 installation, manual path configuration for Python.h, and compiling pyworld and ctc-segmentation from source with x86 architecture emulation.
  • Models: Pretrained encoder/vocoder are used, but a synthesizer model compatible with Chinese symbols is required (either trained or provided).
  • Links: Noiz.ai (cloud hosted version), DEMO VIDEO

Highlighted Details

  • Supports Chinese language synthesis with multiple datasets (aidatatang_200zh, magicdata, aishell3, data_aishell).
  • Offers a web server for remote calling and a command-line interface.
  • Provides guidance on training custom encoder, synthesizer, and vocoder models.
  • Includes troubleshooting tips for VRAM limitations, page file size, and model loading errors.

Maintenance & Community

The repository is no longer actively updated by the original author, who is focusing on a commercialized version at noiz.ai. Community contributions may exist for specific issues or model sharing.

Licensing & Compatibility

The README does not explicitly state a license. The project is a fork of Real-Time-Voice-Cloning, which is typically under permissive licenses like MIT. However, the absence of a clear license in this fork requires verification for commercial use or closed-source linking.

Limitations & Caveats

The project is not actively maintained, and the original demo_toolbox.py may not work with newer PyTorch versions or on M1 Macs without significant workarounds. Training custom models requires substantial computational resources and dataset preparation. Compatibility with specific pre-trained models is version-dependent.

Health Check
Last commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
446 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.