Multi-voice TTS system emphasizing quality, realistic prosody
Top 3.5% on sourcepulse
Tortoise TTS is a high-quality, multi-voice text-to-speech system designed for realistic prosody and intonation. It targets researchers and developers needing advanced TTS capabilities, offering a significant improvement in naturalness over standard TTS models.
How It Works
Tortoise TTS employs a dual-decoder architecture, combining an autoregressive decoder with a diffusion decoder. This approach allows for highly detailed and natural-sounding speech generation, capturing nuances in intonation and prosody. The model is trained for quality, prioritizing realistic voice output.
Quick Start & Requirements
pip install tortoise-tts
or pip install git+https://github.com/neonbjb/tortoise-tts
python setup.py install
. Docker is also provided.Highlighted Details
fast
, ultra_fast
).read.py
, read_fast.py
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model is noted as "insanely slow" in its initial description, though later updates claim significant speed improvements. CPU-only inference is not supported for the Hugging Face demo. DeepSpeed is disabled on Apple Silicon.
8 months ago
1 day