TransformerTTS by spring-media

TensorFlow 2 implementation for non-autoregressive text-to-speech

Created 5 years ago

1,159 stars

Top 33.3% on SourcePulse

Project Summary

This repository provides a TensorFlow 2 implementation of a non-autoregressive Transformer model for Text-to-Speech (TTS). It aims to deliver fast, robust, and controllable speech synthesis, suitable for researchers and developers working on TTS systems.

How It Works

The core of the project is a non-autoregressive Transformer architecture, inspired by FastSpeech and FastPitch. This approach avoids sequential generation, leading to faster inference, improved robustness against repeats and attention failures, and explicit control over speech speed and pitch. The model generates mel-spectrograms, which are then converted to audio waveforms using external vocoders like MelGAN or HiFiGAN.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python >= 3.6, espeak (install via apt-get or brew).
Pre-trained LJSpeech model available for quick inference.
Official Colab notebook available for trying out the model.

Highlighted Details

Non-autoregressive Transformer for fast and robust TTS.
Supports pitch prediction and controllable speech speed.
Compatible with MelGAN and HiFiGAN vocoders for waveform generation.
Includes scripts for training aligner, TTS models, and extracting durations.

Maintenance & Community

Maintained by Francesco Cardinale.
Mentions collaboration with the Mozilla TTS team.
No explicit links to community channels (Discord/Slack) or roadmaps are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions that specific pre-trained model weights require checking out correct repository versions for API compatibility. Support for WaveRNN has been discontinued.

TransformerTTS by spring-media

Explore Similar Projects

VoiceStar by jasonppy

Meta-voicebox by SpeechifyInc

Comprehensive-Transformer-TTS by keonlee9420

FastDiff by Rongjiehuang

VITA-Audio by VITA-MLLM

lora-svc by PlayVoice

vits2_pytorch by p0p4k

melgan by seungwonpark

ParallelWaveGAN by kan-bayashi

FastSpeech2 by ming024

espnet by espnet

GPT-SoVITS by RVC-Boss