alltalk_tts  by erew123

Text-to-speech tool based on Coqui TTS engine

created 1 year ago
1,937 stars

Top 23.1% on sourcepulse

GitHubView on GitHub
Project Summary

AllTalk TTS is a Python-based extension for the Text Generation Web UI, offering advanced text-to-speech capabilities. It targets users who want to integrate high-quality, customizable speech synthesis into their workflows, particularly within conversational AI or content creation, by leveraging the Coqui TTS engine.

How It Works

AllTalk TTS is built upon the Coqui TTS engine, specifically supporting XTTSv2 models. It provides a user-friendly interface with features like model finetuning, support for custom local models, and efficient batch processing. Advanced options include DeepSpeed for performance acceleration and a low VRAM mode, making it accessible even on hardware with limited GPU memory.

Quick Start & Requirements

  • Installation: Clone the repository into your Text Generation Web UI extensions folder (git clone https://github.com/erew123/alltalk_tts) or use the provided setup scripts (atsetup.bat / ./atsetup.sh) for standalone installations.
  • Prerequisites: Python 3.9-3.11.x (tested with 3.11.x), Git. For DeepSpeed, an NVIDIA GPU is required. Windows users may need C++ build tools.
  • Resources: Initial model download is ~1.8GB.
  • Documentation: AllTalk TTS GitHub

Highlighted Details

  • Supports model finetuning for custom voice training.
  • Offers DeepSpeed integration for 2-3x speed improvements.
  • Includes a low VRAM mode for reduced memory footprint.
  • Provides an API suite for integration with third-party software via JSON calls.
  • Features a bulk TTS generator for large text volumes.

Maintenance & Community

The project is maintained by a solo developer, with community support encouraged through discussions and issue reporting. Links to Discord/Slack are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility with commercial or closed-source linking is not specified.

Limitations & Caveats

The project is primarily focused on Version 1, with Version 2 still evolving. Docker and Google Colab support are noted as experimental or in development. The developer is not the creator of the TTS models and advises users to consult original developers for model-specific issues.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
4
Star History
197 stars in the last 90 days

Explore Similar Projects

Starred by Michael Han Michael Han(Cofounder of Unsloth), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
7 more.

TTS by coqui-ai

0.4%
42k
Deep learning toolkit for Text-to-Speech, research-tested
created 5 years ago
updated 11 months ago
Feedback? Help us improve.