TTS research paper using flow-based generative network
Top 41.3% on sourcepulse
Flowtron is an autoregressive, flow-based generative network for text-to-speech (TTS) synthesis, designed for researchers and developers seeking high-quality, expressive speech with controllable variations. It offers style transfer capabilities and aims to match state-of-the-art TTS models in speech quality.
How It Works
Flowtron builds upon Tacotron and autoregressive flows, creating an invertible mapping from data to a latent space. This latent space can be manipulated to control speech characteristics like pitch, tone, speech rate, and accent. The model is optimized by maximizing the likelihood of the training data, ensuring simple and stable training.
Quick Start & Requirements
git submodule update --init
), and install requirements (pip install -r requirements.txt
).Highlighted Details
Maintenance & Community
This project is from NVIDIA. No specific community links or roadmap are provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. It mentions using code from other repositories, implying potential licensing considerations. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README does not detail specific limitations, unsupported platforms, or known bugs. The project appears to be research-oriented, and extensive fine-tuning or specific dataset preparation might be required for optimal performance.
2 years ago
1 day