Cross-Lingual-Voice-Cloning  by deterministic-algorithms-lab

PyTorch for cross-lingual voice cloning research

Created 5 years ago
360 stars

Top 77.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Tacotron 2, modified for cross-lingual voice cloning. It enables faster-than-realtime inference and is suitable for researchers and developers working on multilingual speech synthesis and voice conversion.

How It Works

The model is based on the Tacotron 2 architecture, a sequence-to-sequence model for text-to-spectrogram synthesis. It has been adapted to handle multiple languages and speakers by incorporating speaker and language embeddings. The training process requires dataset files mapping audio paths to corresponding text, speaker IDs, and language IDs.

Quick Start & Requirements

  • Install: Clone the repository and install requirements: pip install -r requirements.txt.
  • Prerequisites: NVIDIA GPU with CUDA and cuDNN. PyTorch 1.0 and Apex are also required.
  • Dataset: Download and extract the LJ Speech dataset. Update .wav paths in filelists/*.txt or set load_mel_from_disk=True.
  • Links: NVIDIA Tacotron2 (inspiration), WaveGlow (related).

Highlighted Details

  • Faster-than-realtime inference.
  • Supports distributed and automatic mixed-precision training via NVIDIA Apex.
  • Modified for cross-lingual voice cloning.
  • Training requires specific dataset formatting with speaker and language IDs.

Maintenance & Community

The project appears to be based on NVIDIA's Tacotron 2 implementation, with modifications by deterministic-algorithms-lab. No specific community channels or active maintenance signals are present in the README.

Licensing & Compatibility

The README does not explicitly state a license. It references NVIDIA's Tacotron 2 repository, which is typically under a permissive license (e.g., Apache 2.0), but this specific fork's licensing is unclear. Compatibility for commercial use would require license clarification.

Limitations & Caveats

The project is a modification of an existing implementation and may inherit its limitations. The README mentions "TODO" items, indicating ongoing development. The setup requires specific versions of PyTorch and Apex, and the dataset format is strict.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.1%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
1 more.

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
Created 1 year ago
Updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
6 more.

OpenVoice by myshell-ai

0.2%
34k
Audio foundation model for versatile, instant voice cloning
Created 1 year ago
Updated 5 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.