tacotronv2_wavernn_chinese  by lturing

TTS pipeline for Chinese speech synthesis

Created 5 years ago
536 stars

Top 59.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides an implementation of TacotronV2 and WaveRNN for Chinese text-to-speech synthesis, targeting researchers and developers working with Mandarin speech. It enables the generation of high-quality, natural-sounding Chinese speech from text, with features for speaker adaptation and improved long-sentence modeling.

How It Works

The system combines TacotronV2 for converting Chinese text to mel-spectrograms and WaveRNN for generating audio waveforms from these mel-spectrograms. It preprocesses Chinese text by converting characters to Pinyin, handling multi-syllable words and numbers. For improved long-sentence synthesis, it explores alternative attention mechanisms like Gaussian Mixture Attention and Forward Attention. Speaker adaptation is achieved by fine-tuning TacotronV2 on new data.

Quick Start & Requirements

  • Install dependencies via requirements.txt.
  • Run synthesis with python tacotron_synthesize.py --text '...'.
  • Requires Python and TensorFlow (tested with 1.14.0) or PyTorch.
  • Preprocessing and training scripts are provided for both TacotronV2 and WaveRNN.

Highlighted Details

  • Implements TacotronV2 with PyTorch and TensorFlow.
  • Offers speaker adaptive training for TacotronV2.
  • Explores alternative attention mechanisms (GMM, Discretized Graves, Forward Attention) to improve long-sentence modeling.
  • Supports mixed Chinese and Pinyin input.
  • Includes TensorFlow Serving + Flask for deployment.

Maintenance & Community

  • Last updated October 2020.
  • No explicit community links (Discord/Slack) or active maintenance signals are present in the README.

Licensing & Compatibility

  • The README does not explicitly state a license.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project was last updated in October 2020, suggesting potential compatibility issues with newer library versions. The README mentions that Location-sensitive attention in TacotronV2 has limitations with long sentences, though alternative attention mechanisms are explored.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.1%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.