tortoise-tts  by neonbjb

Multi-voice TTS system emphasizing quality, realistic prosody

Created 3 years ago
14,592 stars

Top 3.4% on SourcePulse

GitHubView on GitHub
Project Summary

Tortoise TTS is a high-quality, multi-voice text-to-speech system designed for realistic prosody and intonation. It targets researchers and developers needing advanced TTS capabilities, offering a significant improvement in naturalness over standard TTS models.

How It Works

Tortoise TTS employs a dual-decoder architecture, combining an autoregressive decoder with a diffusion decoder. This approach allows for highly detailed and natural-sounding speech generation, capturing nuances in intonation and prosody. The model is trained for quality, prioritizing realistic voice output.

Quick Start & Requirements

  • Install: pip install tortoise-tts or pip install git+https://github.com/neonbjb/tortoise-tts
  • Prerequisites: NVIDIA GPU (CUDA 11.7+ recommended), Python 3.9+. Conda installation is highly recommended for Windows to manage dependencies. Apple Silicon requires PyTorch nightly builds.
  • Setup: Local installation involves cloning the repo, setting up a Conda environment, installing PyTorch, and running python setup.py install. Docker is also provided.
  • Docs: Manuscript, Hugging Face Space

Highlighted Details

  • Achieves 0.25-0.3 RTF on 4GB VRAM with streaming for < 500ms latency.
  • Supports multiple voices and programmatic API usage.
  • Offers presets for faster inference (fast, ultra_fast).
  • Includes tools for batch processing text files (read.py, read_fast.py).

Maintenance & Community

  • Developed by James Betker; employer not involved.
  • Project appears active based on recent activity and ongoing development.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The model is noted as "insanely slow" in its initial description, though later updates claim significant speed improvements. CPU-only inference is not supported for the Hugging Face demo. DeepSpeed is disabled on Apple Silicon.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
1
Star History
92 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.