TTS  by coqui-ai

Deep learning toolkit for Text-to-Speech, research-tested

Created 5 years ago
42,642 stars

Top 0.6% on SourcePulse

GitHubView on GitHub
Project Summary

🐸TTS is a comprehensive deep learning toolkit for Text-to-Speech (TTS) synthesis, offering over 1100 pretrained models across numerous languages. It empowers researchers and developers with tools for training new TTS models, fine-tuning existing ones, and performing dataset analysis, making advanced speech synthesis accessible for both research and production environments.

How It Works

🐸TTS supports a wide array of TTS architectures, including spectrogram-based models like Tacotron2 and Glow-TTS, as well as end-to-end models such as VITS and YourTTS. It integrates various vocoder models (e.g., MelGAN, HiFiGAN) for high-fidelity audio generation and includes speaker encoder models for efficient speaker embedding extraction, enabling voice cloning and conversion capabilities.

Quick Start & Requirements

  • Installation: pip install TTS for synthesis; git clone https://github.com/coqui-ai/TTS && pip install -e .[all,dev,notebooks] for development. Docker images are also available.
  • Prerequisites: Python >= 3.9, < 3.12. GPU with CUDA is recommended for training and faster inference.
  • Documentation: ReadTheDocs

Highlighted Details

  • Supports over 1100 pretrained models in multiple languages.
  • Features voice cloning and voice conversion capabilities.
  • Includes streaming TTS with low latency (<200ms).
  • Offers tools for dataset analysis and curation.

Maintenance & Community

The project is actively developed by Coqui.ai, with community support available via Discord and GitHub Discussions for usage questions.

Licensing & Compatibility

The project is licensed under the Mozilla Public License 2.0 (MPL-2.0), which is generally permissive for commercial use but requires derived works to be open-sourced if distributed.

Limitations & Caveats

While extensive, the sheer number of models and configurations can lead to a steep learning curve. Some advanced features or newer models might still be under active development or require specific hardware configurations for optimal performance.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
9
Star History
600 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.1%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

Amphion by open-mmlab

0.2%
9k
Toolkit for audio, music, and speech generation research
Created 1 year ago
Updated 3 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
51k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.