TransformerTTS  by spring-media

TensorFlow 2 implementation for non-autoregressive text-to-speech

Created 5 years ago
1,152 stars

Top 33.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a TensorFlow 2 implementation of a non-autoregressive Transformer model for Text-to-Speech (TTS). It aims to deliver fast, robust, and controllable speech synthesis, suitable for researchers and developers working on TTS systems.

How It Works

The core of the project is a non-autoregressive Transformer architecture, inspired by FastSpeech and FastPitch. This approach avoids sequential generation, leading to faster inference, improved robustness against repeats and attention failures, and explicit control over speech speed and pitch. The model generates mel-spectrograms, which are then converted to audio waveforms using external vocoders like MelGAN or HiFiGAN.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python >= 3.6, espeak (install via apt-get or brew).
  • Pre-trained LJSpeech model available for quick inference.
  • Official Colab notebook available for trying out the model.

Highlighted Details

  • Non-autoregressive Transformer for fast and robust TTS.
  • Supports pitch prediction and controllable speech speed.
  • Compatible with MelGAN and HiFiGAN vocoders for waveform generation.
  • Includes scripts for training aligner, TTS models, and extracting durations.

Maintenance & Community

  • Maintained by Francesco Cardinale.
  • Mentions collaboration with the Mozilla TTS team.
  • No explicit links to community channels (Discord/Slack) or roadmaps are provided in the README.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README text.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions that specific pre-trained model weights require checking out correct repository versions for API compatibility. Support for WaveRNN has been discontinued.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), and
3 more.

espnet by espnet

0.2%
9k
End-to-end speech processing toolkit for various speech tasks
Created 7 years ago
Updated 3 days ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.