PyTorch for cross-lingual voice cloning research
Top 78.8% on sourcepulse
This repository provides a PyTorch implementation of Tacotron 2, modified for cross-lingual voice cloning. It enables faster-than-realtime inference and is suitable for researchers and developers working on multilingual speech synthesis and voice conversion.
How It Works
The model is based on the Tacotron 2 architecture, a sequence-to-sequence model for text-to-spectrogram synthesis. It has been adapted to handle multiple languages and speakers by incorporating speaker and language embeddings. The training process requires dataset files mapping audio paths to corresponding text, speaker IDs, and language IDs.
Quick Start & Requirements
pip install -r requirements.txt
..wav
paths in filelists/*.txt
or set load_mel_from_disk=True
.Highlighted Details
Maintenance & Community
The project appears to be based on NVIDIA's Tacotron 2 implementation, with modifications by deterministic-algorithms-lab. No specific community channels or active maintenance signals are present in the README.
Licensing & Compatibility
The README does not explicitly state a license. It references NVIDIA's Tacotron 2 repository, which is typically under a permissive license (e.g., Apache 2.0), but this specific fork's licensing is unclear. Compatibility for commercial use would require license clarification.
Limitations & Caveats
The project is a modification of an existing implementation and may inherit its limitations. The README mentions "TODO" items, indicating ongoing development. The setup requires specific versions of PyTorch and Apex, and the dataset format is strict.
2 years ago
Inactive