TTS by coqui-ai

Deep learning toolkit for Text-to-Speech, research-tested

Created 5 years ago

44,620 stars

Top 0.6% on SourcePulse

View on GitHub

13 Experts Love This Project

Author of "AI Engineering", "Designing Machine Learning Systems"

Abubakar Abid

Cofounder of Gradio

and 9 more!

Project Summary

🐸TTS is a comprehensive deep learning toolkit for Text-to-Speech (TTS) synthesis, offering over 1100 pretrained models across numerous languages. It empowers researchers and developers with tools for training new TTS models, fine-tuning existing ones, and performing dataset analysis, making advanced speech synthesis accessible for both research and production environments.

How It Works

🐸TTS supports a wide array of TTS architectures, including spectrogram-based models like Tacotron2 and Glow-TTS, as well as end-to-end models such as VITS and YourTTS. It integrates various vocoder models (e.g., MelGAN, HiFiGAN) for high-fidelity audio generation and includes speaker encoder models for efficient speaker embedding extraction, enabling voice cloning and conversion capabilities.

Quick Start & Requirements

Installation: pip install TTS for synthesis; git clone https://github.com/coqui-ai/TTS && pip install -e .[all,dev,notebooks] for development. Docker images are also available.
Prerequisites: Python >= 3.9, < 3.12. GPU with CUDA is recommended for training and faster inference.
Documentation: ReadTheDocs

Highlighted Details

Supports over 1100 pretrained models in multiple languages.
Features voice cloning and voice conversion capabilities.
Includes streaming TTS with low latency (<200ms).
Offers tools for dataset analysis and curation.

Maintenance & Community

The project is actively developed by Coqui.ai, with community support available via Discord and GitHub Discussions for usage questions.

Licensing & Compatibility

The project is licensed under the Mozilla Public License 2.0 (MPL-2.0), which is generally permissive for commercial use but requires derived works to be open-sourced if distributed.

Limitations & Caveats

While extensive, the sheer number of models and configurations can lead to a steep learning curve. Some advanced features or newer models might still be under active development or require specific hardware configurations for optimal performance.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

335 stars in the last 30 days