tacotronv2_wavernn_chinese by lturing

TTS pipeline for Chinese speech synthesis

Created 5 years ago

538 stars

Top 59.0% on SourcePulse

Project Summary

This repository provides an implementation of TacotronV2 and WaveRNN for Chinese text-to-speech synthesis, targeting researchers and developers working with Mandarin speech. It enables the generation of high-quality, natural-sounding Chinese speech from text, with features for speaker adaptation and improved long-sentence modeling.

How It Works

The system combines TacotronV2 for converting Chinese text to mel-spectrograms and WaveRNN for generating audio waveforms from these mel-spectrograms. It preprocesses Chinese text by converting characters to Pinyin, handling multi-syllable words and numbers. For improved long-sentence synthesis, it explores alternative attention mechanisms like Gaussian Mixture Attention and Forward Attention. Speaker adaptation is achieved by fine-tuning TacotronV2 on new data.

Quick Start & Requirements

Install dependencies via requirements.txt.
Run synthesis with python tacotron_synthesize.py --text '...'.
Requires Python and TensorFlow (tested with 1.14.0) or PyTorch.
Preprocessing and training scripts are provided for both TacotronV2 and WaveRNN.

Highlighted Details

Implements TacotronV2 with PyTorch and TensorFlow.
Offers speaker adaptive training for TacotronV2.
Explores alternative attention mechanisms (GMM, Discretized Graves, Forward Attention) to improve long-sentence modeling.
Supports mixed Chinese and Pinyin input.
Includes TensorFlow Serving + Flask for deployment.

Maintenance & Community

Last updated October 2020.
No explicit community links (Discord/Slack) or active maintenance signals are present in the README.

Licensing & Compatibility

The README does not explicitly state a license.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project was last updated in October 2020, suggesting potential compatibility issues with newer library versions. The README mentions that Location-sensitive attention in TacotronV2 has limitations with long sentences, though alternative attention mechanisms are explored.

tacotronv2_wavernn_chinese by lturing

Explore Similar Projects

Cross-Lingual-Voice-Cloning by deterministic-algorithms-lab

FastDiff by Rongjiehuang

speech-synthesis-paper by wenet-e2e

flowtron by NVIDIA

TransformerTTS by spring-media

MARS5-TTS by Camb-ai

hifi-gan by jik876

metavoice-src by metavoiceio

VITS-fast-fine-tuning by Plachtaa

VALL-E-X by Plachtaa

Spark-TTS by SparkAudio

GPT-SoVITS by RVC-Boss