Cross-Lingual-Voice-Cloning  by deterministic-algorithms-lab

PyTorch for cross-lingual voice cloning research

created 5 years ago
361 stars

Top 78.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Tacotron 2, modified for cross-lingual voice cloning. It enables faster-than-realtime inference and is suitable for researchers and developers working on multilingual speech synthesis and voice conversion.

How It Works

The model is based on the Tacotron 2 architecture, a sequence-to-sequence model for text-to-spectrogram synthesis. It has been adapted to handle multiple languages and speakers by incorporating speaker and language embeddings. The training process requires dataset files mapping audio paths to corresponding text, speaker IDs, and language IDs.

Quick Start & Requirements

  • Install: Clone the repository and install requirements: pip install -r requirements.txt.
  • Prerequisites: NVIDIA GPU with CUDA and cuDNN. PyTorch 1.0 and Apex are also required.
  • Dataset: Download and extract the LJ Speech dataset. Update .wav paths in filelists/*.txt or set load_mel_from_disk=True.
  • Links: NVIDIA Tacotron2 (inspiration), WaveGlow (related).

Highlighted Details

  • Faster-than-realtime inference.
  • Supports distributed and automatic mixed-precision training via NVIDIA Apex.
  • Modified for cross-lingual voice cloning.
  • Training requires specific dataset formatting with speaker and language IDs.

Maintenance & Community

The project appears to be based on NVIDIA's Tacotron 2 implementation, with modifications by deterministic-algorithms-lab. No specific community channels or active maintenance signals are present in the README.

Licensing & Compatibility

The README does not explicitly state a license. It references NVIDIA's Tacotron 2 repository, which is typically under a permissive license (e.g., Apache 2.0), but this specific fork's licensing is unclear. Compatibility for commercial use would require license clarification.

Limitations & Caveats

The project is a modification of an existing implementation and may inherit its limitations. The README mentions "TODO" items, indicating ongoing development. The setup requires specific versions of PyTorch and Apex, and the dataset format is strict.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
2 more.

tacotron2 by NVIDIA

0.0%
5k
PyTorch implementation for text-to-speech synthesis
created 7 years ago
updated 1 year ago
Feedback? Help us improve.