Tacotron 2 implementation for multilingual speech synthesis research
Top 43.4% on sourcepulse
This repository provides an implementation of Tacotron 2 for multilingual text-to-speech (TTS) synthesis, supporting parameter sharing, code-switching, and voice cloning. It is targeted at researchers and developers working on advanced TTS systems who need to train models on multiple languages or handle mixed-language speech. The project offers a flexible approach to encoder parameter sharing, aiming to balance efficiency with linguistic flexibility.
How It Works
The core of the implementation is a Tacotron 2 architecture adapted for multilingualism. It explores three parameter-sharing strategies for the encoder: full sharing with an adversarial classifier to remove speaker information, language-specific encoders, and a hybrid approach using a parameter generator for language-specific encoder parameters. This hybrid method, combined with domain adversarial training, aims to achieve effective parameter sharing while retaining flexibility for different languages and voices.
Quick Start & Requirements
pip3 install -r requirements.txt
python3 prepare_css_spectrograms.py
.PYTHONIOENCODING=utf-8 python3 train.py --hyper_parameters generated_switching.json
tensorboard --logdir logs --port 6666
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README mentions using a WaveRNN vocoder and provides a link to its repository, implying it's a separate dependency. Training requires significant computational resources and dataset preparation.
1 year ago
Inactive