Discover and explore top open-source AI tools and projects—updated daily.
TTS pipeline for Chinese speech synthesis
Top 59.4% on SourcePulse
This repository provides an implementation of TacotronV2 and WaveRNN for Chinese text-to-speech synthesis, targeting researchers and developers working with Mandarin speech. It enables the generation of high-quality, natural-sounding Chinese speech from text, with features for speaker adaptation and improved long-sentence modeling.
How It Works
The system combines TacotronV2 for converting Chinese text to mel-spectrograms and WaveRNN for generating audio waveforms from these mel-spectrograms. It preprocesses Chinese text by converting characters to Pinyin, handling multi-syllable words and numbers. For improved long-sentence synthesis, it explores alternative attention mechanisms like Gaussian Mixture Attention and Forward Attention. Speaker adaptation is achieved by fine-tuning TacotronV2 on new data.
Quick Start & Requirements
requirements.txt
.python tacotron_synthesize.py --text '...'
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project was last updated in October 2020, suggesting potential compatibility issues with newer library versions. The README mentions that Location-sensitive attention in TacotronV2 has limitations with long sentences, though alternative attention mechanisms are explored.
2 years ago
Inactive