TensorFlow 2 implementation for non-autoregressive text-to-speech
Top 34.3% on sourcepulse
This repository provides a TensorFlow 2 implementation of a non-autoregressive Transformer model for Text-to-Speech (TTS). It aims to deliver fast, robust, and controllable speech synthesis, suitable for researchers and developers working on TTS systems.
How It Works
The core of the project is a non-autoregressive Transformer architecture, inspired by FastSpeech and FastPitch. This approach avoids sequential generation, leading to faster inference, improved robustness against repeats and attention failures, and explicit control over speech speed and pitch. The model generates mel-spectrograms, which are then converted to audio waveforms using external vocoders like MelGAN or HiFiGAN.
Quick Start & Requirements
pip install -r requirements.txt
espeak
(install via apt-get
or brew
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README mentions that specific pre-trained model weights require checking out correct repository versions for API compatibility. Support for WaveRNN has been discontinued.
1 year ago
Inactive