chatterbox by resemble-ai

Open-source TTS model

Created 10 months ago

22,843 stars

Top 1.9% on SourcePulse

12 Experts Love This Project

transitive-bullshit

Founder of Agentic

ebursztein

Cybersecurity Lead at Google DeepMind

gakonst

Georgios Konstantopoulos

CTO, General Partner at Paradigm

zhyncs

Inference Lead at SGLang; Research Scientist at Together AI

and 8 more!

Project Summary

Chatterbox TTS is an open-source text-to-speech model designed for content creators, developers, and AI agent builders. It offers state-of-the-art zero-shot voice cloning and unique emotion exaggeration control, aiming to provide high-quality, expressive speech synthesis that rivals closed-source solutions.

How It Works

Chatterbox utilizes a Llama backbone and alignment-informed inference for ultra-stable audio generation. Its key innovation is the emotion exaggeration control, allowing users to fine-tune the intensity and expressiveness of synthesized speech. This approach, combined with training on a large dataset, aims for superior quality and control in voice generation.

Quick Start & Requirements

Install via pip: pip install chatterbox-tts
Requires Python 3.11 and CUDA. Tested on Debian 11.
Official Hugging Face Gradio app available for immediate testing.
Example usage scripts (example_tts.py, example_vc.py) are provided.

Highlighted Details

Benchmarked against and preferred over leading closed-source systems like ElevenLabs.
Features unique emotion exaggeration control for expressive speech.
Implements PerTh Watermarking for responsible AI, embedding imperceptible watermarks into outputs.
Supports voice conversion via an audio_prompt_path argument.

Maintenance & Community

Developed by Resemble AI.
Active Discord community for support and collaboration.

Licensing & Compatibility

Licensed under MIT, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

Currently supports only English language synthesis.
While benchmarked, performance can vary based on prompt audio quality and speaking style.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

13

Issues (30d)

9

Star History

971 stars in the last 30 days

Explore Similar Projects

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

VoiceStar by jasonppy

Robust, duration-controllable TTS that extrapolates

Created 10 months ago

Updated 9 months ago

voicebox-pytorch by lucidrains

Pytorch implementation of MetaAI's Voicebox text-to-speech model

Created 2 years ago

Updated 1 year ago

Step-Audio-EditX by stepfun-ai

LLM-driven audio model for expressive editing and TTS

Created 3 months ago

Updated 1 week ago

VITA-Audio by VITA-MLLM

Speech model for fast audio-text token generation

Created 10 months ago

Updated 9 months ago

FireRedTTS by FireRedTeam

LLM-empowered TTS system for research

Created 1 year ago

Updated 5 months ago

Kokoros by lucasjinreal

Rust crate for fast, high-quality TTS

Created 1 year ago

Updated 1 month ago

GLM-TTS by zai-org

Controllable, emotion-expressive zero-shot TTS

Created 2 months ago

Updated 2 months ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs).

speech-synthesis-paper by wenet-e2e

Speech synthesis papers list

Created 5 years ago

Updated 2 years ago

Starred by

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm).

MARS5-TTS by Camb-ai

Speech model (TTS) for prosody generation

Created 1 year ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Pietro Schirano

Pietro Schirano(Founder of MagicPath), and

2 more.

metavoice-src by metavoiceio

TTS model for human-like, expressive speech

Created 2 years ago

Updated 1 year ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

2 more.

VoiceCraft by jasonppy

Zero-shot speech editing and TTS research paper

Created 1 year ago

Updated 11 months ago

Starred by

Michael Han

Michael Han(Cofounder of Unsloth),

Alex Chen

Alex Chen(Cofounder of Nexa AI), and

12 more.

dia by nari-labs

TTS model for ultra-realistic dialogue generation

Created 10 months ago

Updated 3 months ago

Feedback? Help us improve.