ChatTTS  by 2noise

Generative speech model for daily dialogue

created 1 year ago
37,320 stars

Top 0.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

ChatTTS is a generative speech model optimized for dialogue scenarios, targeting LLM assistants and conversational AI applications. It provides natural, expressive speech synthesis with fine-grained control over prosody, including laughter and pauses, aiming to surpass existing open-source TTS models in conversational quality.

How It Works

ChatTTS employs a novel approach for dialogue-centric speech synthesis, enabling natural and expressive vocalizations. It supports multiple speakers and offers fine-grained control over prosodic features like laughter, pauses, and interjections. The model is trained on a substantial dataset of Chinese and English audio, with a focus on delivering superior prosody compared to other open-source TTS solutions.

Quick Start & Requirements

  • Install: pip install ChatTTS or pip install -e . for local development.
  • Prerequisites: Python 3.11+, PyTorch, torchaudio. Optional: vLLM (Linux), TransformerEngine (Linux), FlashAttention-2 (NVIDIA GPU).
  • Resources: Requires at least 4GB VRAM for a 30-second audio clip. Inference speed on a 4090 GPU is approximately 7 semantic tokens/sec (RTF ~0.3).
  • Links: Huggingface Models, Colab Example

Highlighted Details

  • Optimized for conversational AI and LLM assistants.
  • Supports fine-grained control over prosody (laughter, pauses, interjections).
  • Achieves superior prosody compared to many open-source TTS models.
  • Trained on 100,000+ hours of Chinese and English audio data.

Maintenance & Community

Licensing & Compatibility

  • Code License: AGPLv3+
  • Model License: CC BY-NC 4.0 (Non-commercial, educational, and research use only).

Limitations & Caveats

The released model is for academic and research purposes only and cannot be used commercially. The authors have intentionally added noise and compressed audio quality to deter malicious use. English synthesis is noted as experimental.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
8
Star History
1,467 stars in the last 90 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
1 more.

chatterbox by resemble-ai

1.6%
10k
Open-source TTS model
created 3 months ago
updated 1 day ago
Feedback? Help us improve.