chatterbox  by resemble-ai

Open-source TTS model

Created 4 months ago
12,997 stars

Top 3.8% on SourcePulse

GitHubView on GitHub
Project Summary

Chatterbox TTS is an open-source text-to-speech model designed for content creators, developers, and AI agent builders. It offers state-of-the-art zero-shot voice cloning and unique emotion exaggeration control, aiming to provide high-quality, expressive speech synthesis that rivals closed-source solutions.

How It Works

Chatterbox utilizes a Llama backbone and alignment-informed inference for ultra-stable audio generation. Its key innovation is the emotion exaggeration control, allowing users to fine-tune the intensity and expressiveness of synthesized speech. This approach, combined with training on a large dataset, aims for superior quality and control in voice generation.

Quick Start & Requirements

  • Install via pip: pip install chatterbox-tts
  • Requires Python 3.11 and CUDA. Tested on Debian 11.
  • Official Hugging Face Gradio app available for immediate testing.
  • Example usage scripts (example_tts.py, example_vc.py) are provided.

Highlighted Details

  • Benchmarked against and preferred over leading closed-source systems like ElevenLabs.
  • Features unique emotion exaggeration control for expressive speech.
  • Implements PerTh Watermarking for responsible AI, embedding imperceptible watermarks into outputs.
  • Supports voice conversion via an audio_prompt_path argument.

Maintenance & Community

  • Developed by Resemble AI.
  • Active Discord community for support and collaboration.

Licensing & Compatibility

  • Licensed under MIT, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

  • Currently supports only English language synthesis.
  • While benchmarked, performance can vary based on prompt audio quality and speaking style.
Health Check
Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
23
Issues (30d)
46
Star History
2,800 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.1%
4k
TTS model for human-like, expressive speech
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.