tacotron2 by NVIDIA

PyTorch implementation for text-to-speech synthesis

Created 7 years ago

5,300 stars

Top 9.3% on SourcePulse

View on GitHub

5 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Luca Antiga

CTO of Lightning AI

Jong Wook Kim

Research Scientist at OpenAI

and 1 more!

Project Summary

This PyTorch implementation of Tacotron 2 provides a Natural TTS Synthesis system that conditions WaveNet on Mel Spectrogram Predictions. It is designed for researchers and developers working on text-to-speech synthesis, offering faster-than-realtime inference and leveraging NVIDIA's Apex and AMP for distributed and mixed-precision training.

How It Works

The system generates Mel Spectrograms from input text using a sequence-to-sequence model, which are then synthesized into audio by a vocoder (like WaveGlow). This approach decouples the acoustic modeling from the vocoding process, allowing for independent optimization and enabling faster inference. The use of Automatic Mixed Precision (AMP) and distributed training significantly speeds up the training process on multi-GPU setups.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: NVIDIA GPU with CUDA, cuDNN, PyTorch 1.0, NVIDIA Apex.
Dataset: LJSpeech dataset (download and extract).
Setup: Requires downloading pre-trained models for inference. Training involves configuring dataset paths.
Docs: https://github.com/NVIDIA/tacotron2

Highlighted Details

Faster-than-realtime inference.
Supports distributed training and Automatic Mixed Precision (AMP) via NVIDIA Apex.
Trained on the LJSpeech dataset.
Inference demo available via Jupyter notebook.

Maintenance & Community

This project is maintained by NVIDIA. Related repositories include WaveGlow and nv-wavenet.

Licensing & Compatibility

The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The implementation relies on specific versions of PyTorch (1.0) and NVIDIA Apex, which may require careful environment management. The README mentions using a specific mel-spectrogram representation for Tacotron 2 and the Mel decoder, implying potential compatibility issues if different representations are used.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days