tacotron2  by NVIDIA

PyTorch implementation for text-to-speech synthesis

created 7 years ago
5,256 stars

Top 9.7% on sourcepulse

GitHubView on GitHub
Project Summary

This PyTorch implementation of Tacotron 2 provides a Natural TTS Synthesis system that conditions WaveNet on Mel Spectrogram Predictions. It is designed for researchers and developers working on text-to-speech synthesis, offering faster-than-realtime inference and leveraging NVIDIA's Apex and AMP for distributed and mixed-precision training.

How It Works

The system generates Mel Spectrograms from input text using a sequence-to-sequence model, which are then synthesized into audio by a vocoder (like WaveGlow). This approach decouples the acoustic modeling from the vocoding process, allowing for independent optimization and enabling faster inference. The use of Automatic Mixed Precision (AMP) and distributed training significantly speeds up the training process on multi-GPU setups.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: NVIDIA GPU with CUDA, cuDNN, PyTorch 1.0, NVIDIA Apex.
  • Dataset: LJSpeech dataset (download and extract).
  • Setup: Requires downloading pre-trained models for inference. Training involves configuring dataset paths.
  • Docs: https://github.com/NVIDIA/tacotron2

Highlighted Details

  • Faster-than-realtime inference.
  • Supports distributed training and Automatic Mixed Precision (AMP) via NVIDIA Apex.
  • Trained on the LJSpeech dataset.
  • Inference demo available via Jupyter notebook.

Maintenance & Community

This project is maintained by NVIDIA. Related repositories include WaveGlow and nv-wavenet.

Licensing & Compatibility

The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The implementation relies on specific versions of PyTorch (1.0) and NVIDIA Apex, which may require careful environment management. The README mentions using a specific mel-spectrogram representation for Tacotron 2 and the Mel decoder, implying potential compatibility issues if different representations are used.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
44 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.