Cross-Lingual-Voice-Cloning by deterministic-algorithms-lab

PyTorch for cross-lingual voice cloning research

Created 5 years ago

360 stars

Top 77.9% on SourcePulse

Project Summary

This repository provides a PyTorch implementation of Tacotron 2, modified for cross-lingual voice cloning. It enables faster-than-realtime inference and is suitable for researchers and developers working on multilingual speech synthesis and voice conversion.

How It Works

The model is based on the Tacotron 2 architecture, a sequence-to-sequence model for text-to-spectrogram synthesis. It has been adapted to handle multiple languages and speakers by incorporating speaker and language embeddings. The training process requires dataset files mapping audio paths to corresponding text, speaker IDs, and language IDs.

Quick Start & Requirements

Install: Clone the repository and install requirements: pip install -r requirements.txt.
Prerequisites: NVIDIA GPU with CUDA and cuDNN. PyTorch 1.0 and Apex are also required.
Dataset: Download and extract the LJ Speech dataset. Update .wav paths in filelists/*.txt or set load_mel_from_disk=True.
Links: NVIDIA Tacotron2 (inspiration), WaveGlow (related).

Highlighted Details

Faster-than-realtime inference.
Supports distributed and automatic mixed-precision training via NVIDIA Apex.
Modified for cross-lingual voice cloning.
Training requires specific dataset formatting with speaker and language IDs.

Maintenance & Community

The project appears to be based on NVIDIA's Tacotron 2 implementation, with modifications by deterministic-algorithms-lab. No specific community channels or active maintenance signals are present in the README.

Licensing & Compatibility

The README does not explicitly state a license. It references NVIDIA's Tacotron 2 repository, which is typically under a permissive license (e.g., Apache 2.0), but this specific fork's licensing is unclear. Compatibility for commercial use would require license clarification.

Limitations & Caveats

The project is a modification of an existing implementation and may inherit its limitations. The README mentions "TODO" items, indicating ongoing development. The setup requires specific versions of PyTorch and Apex, and the dataset format is strict.

Cross-Lingual-Voice-Cloning by deterministic-algorithms-lab

Explore Similar Projects

MahaTTS by dubverse-ai

cosyvoice-api by jianchang512

Multilingual_Text_to_Speech by Tomiinek

WhisperSpeech by WhisperSpeech

metavoice-src by metavoiceio

VALL-E-X by Plachtaa

Zonos by Zyphra

Spark-TTS by SparkAudio

fish-speech by fishaudio

OpenVoice by myshell-ai

GPT-SoVITS by RVC-Boss

Real-Time-Voice-Cloning by CorentinJ