alltalk_tts by erew123

Text-to-speech tool based on Coqui TTS engine

Created 2 years ago

2,258 stars

Top 19.6% on SourcePulse

Project Summary

AllTalk TTS is a Python-based extension for the Text Generation Web UI, offering advanced text-to-speech capabilities. It targets users who want to integrate high-quality, customizable speech synthesis into their workflows, particularly within conversational AI or content creation, by leveraging the Coqui TTS engine.

How It Works

AllTalk TTS is built upon the Coqui TTS engine, specifically supporting XTTSv2 models. It provides a user-friendly interface with features like model finetuning, support for custom local models, and efficient batch processing. Advanced options include DeepSpeed for performance acceleration and a low VRAM mode, making it accessible even on hardware with limited GPU memory.

Quick Start & Requirements

Installation: Clone the repository into your Text Generation Web UI extensions folder (git clone https://github.com/erew123/alltalk_tts) or use the provided setup scripts (atsetup.bat / ./atsetup.sh) for standalone installations.
Prerequisites: Python 3.9-3.11.x (tested with 3.11.x), Git. For DeepSpeed, an NVIDIA GPU is required. Windows users may need C++ build tools.
Resources: Initial model download is ~1.8GB.
Documentation: AllTalk TTS GitHub

Highlighted Details

Supports model finetuning for custom voice training.
Offers DeepSpeed integration for 2-3x speed improvements.
Includes a low VRAM mode for reduced memory footprint.
Provides an API suite for integration with third-party software via JSON calls.
Features a bulk TTS generator for large text volumes.

Maintenance & Community

The project is maintained by a solo developer, with community support encouraged through discussions and issue reporting. Links to Discord/Slack are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility with commercial or closed-source linking is not specified.

Limitations & Caveats

The project is primarily focused on Version 1, with Version 2 still evolving. Docker and Google Colab support are noted as experimental or in development. The developer is not the creator of the TTS models and advises users to consult original developers for model-specific issues.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

34 stars in the last 30 days