TTS  by coqui-ai

Deep learning toolkit for Text-to-Speech, research-tested

created 5 years ago
41,732 stars

Top 0.6% on sourcepulse

GitHubView on GitHub
Project Summary

🐸TTS is a comprehensive deep learning toolkit for Text-to-Speech (TTS) synthesis, offering over 1100 pretrained models across numerous languages. It empowers researchers and developers with tools for training new TTS models, fine-tuning existing ones, and performing dataset analysis, making advanced speech synthesis accessible for both research and production environments.

How It Works

🐸TTS supports a wide array of TTS architectures, including spectrogram-based models like Tacotron2 and Glow-TTS, as well as end-to-end models such as VITS and YourTTS. It integrates various vocoder models (e.g., MelGAN, HiFiGAN) for high-fidelity audio generation and includes speaker encoder models for efficient speaker embedding extraction, enabling voice cloning and conversion capabilities.

Quick Start & Requirements

  • Installation: pip install TTS for synthesis; git clone https://github.com/coqui-ai/TTS && pip install -e .[all,dev,notebooks] for development. Docker images are also available.
  • Prerequisites: Python >= 3.9, < 3.12. GPU with CUDA is recommended for training and faster inference.
  • Documentation: ReadTheDocs

Highlighted Details

  • Supports over 1100 pretrained models in multiple languages.
  • Features voice cloning and voice conversion capabilities.
  • Includes streaming TTS with low latency (<200ms).
  • Offers tools for dataset analysis and curation.

Maintenance & Community

The project is actively developed by Coqui.ai, with community support available via Discord and GitHub Discussions for usage questions.

Licensing & Compatibility

The project is licensed under the Mozilla Public License 2.0 (MPL-2.0), which is generally permissive for commercial use but requires derived works to be open-sourced if distributed.

Limitations & Caveats

While extensive, the sheer number of models and configurations can lead to a steep learning curve. Some advanced features or newer models might still be under active development or require specific hardware configurations for optimal performance.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
9
Star History
2,240 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

GPT-SoVITS by RVC-Boss

0.6%
49k
Few-shot voice cloning and TTS web UI
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.