flowtron by NVIDIA

TTS research paper using flow-based generative network

Created 5 years ago

900 stars

Top 40.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Anastasis Germanidis

Cofounder of Runway

Project Summary

Flowtron is an autoregressive, flow-based generative network for text-to-speech (TTS) synthesis, designed for researchers and developers seeking high-quality, expressive speech with controllable variations. It offers style transfer capabilities and aims to match state-of-the-art TTS models in speech quality.

How It Works

Flowtron builds upon Tacotron and autoregressive flows, creating an invertible mapping from data to a latent space. This latent space can be manipulated to control speech characteristics like pitch, tone, speech rate, and accent. The model is optimized by maximizing the likelihood of the training data, ensuring simple and stable training.

Quick Start & Requirements

Install: Clone the repository, initialize submodules (git submodule update --init), and install requirements (pip install -r requirements.txt).
Prerequisites: NVIDIA GPU with CUDA and cuDNN.
Setup: Requires cloning the repository and initializing submodules.
Resources: Training requires a dataset and configuration. Links to pre-trained models (LJS, LibriTTS) are provided.
Docs: Website for audio samples.

Highlighted Details

Autoregressive flow-based generative network for TTS.
Control over speech variation, interpolation, and style transfer.
Matches state-of-the-art TTS models in Mean Opinion Scores (MOS).
Supports multi-GPU and Automatic Mixed Precision (AMP) training.

Maintenance & Community

This project is from NVIDIA. No specific community links or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. It mentions using code from other repositories, implying potential licensing considerations. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs. The project appears to be research-oriented, and extensive fine-tuning or specific dataset preparation might be required for optimal performance.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days