Cross-Lingual-Voice-Cloning  by deterministic-algorithms-lab

PyTorch for cross-lingual voice cloning research

Created 5 years ago
360 stars

Top 77.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Tacotron 2, modified for cross-lingual voice cloning. It enables faster-than-realtime inference and is suitable for researchers and developers working on multilingual speech synthesis and voice conversion.

How It Works

The model is based on the Tacotron 2 architecture, a sequence-to-sequence model for text-to-spectrogram synthesis. It has been adapted to handle multiple languages and speakers by incorporating speaker and language embeddings. The training process requires dataset files mapping audio paths to corresponding text, speaker IDs, and language IDs.

Quick Start & Requirements

  • Install: Clone the repository and install requirements: pip install -r requirements.txt.
  • Prerequisites: NVIDIA GPU with CUDA and cuDNN. PyTorch 1.0 and Apex are also required.
  • Dataset: Download and extract the LJ Speech dataset. Update .wav paths in filelists/*.txt or set load_mel_from_disk=True.
  • Links: NVIDIA Tacotron2 (inspiration), WaveGlow (related).

Highlighted Details

  • Faster-than-realtime inference.
  • Supports distributed and automatic mixed-precision training via NVIDIA Apex.
  • Modified for cross-lingual voice cloning.
  • Training requires specific dataset formatting with speaker and language IDs.

Maintenance & Community

The project appears to be based on NVIDIA's Tacotron 2 implementation, with modifications by deterministic-algorithms-lab. No specific community channels or active maintenance signals are present in the README.

Licensing & Compatibility

The README does not explicitly state a license. It references NVIDIA's Tacotron 2 repository, which is typically under a permissive license (e.g., Apache 2.0), but this specific fork's licensing is unclear. Compatibility for commercial use would require license clarification.

Limitations & Caveats

The project is a modification of an existing implementation and may inherit its limitations. The README mentions "TODO" items, indicating ongoing development. The setup requires specific versions of PyTorch and Apex, and the dataset format is strict.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.1%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

fish-speech by fishaudio

0.4%
24k
Open-source TTS for multilingual speech synthesis
Created 2 years ago
Updated 23 hours ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
6 more.

OpenVoice by myshell-ai

0.4%
35k
Audio foundation model for versatile, instant voice cloning
Created 1 year ago
Updated 6 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
52k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.