Chatterbox-TTS-Extended by petermg

TTS pipeline for advanced speech synthesis

Created 9 months ago

527 stars

Top 59.9% on SourcePulse

Project Summary

Chatterbox-TTS-Extended is a comprehensive Text-to-Speech (TTS) pipeline designed for advanced single and batch audio generation, voice conversion, and artifact reduction. It targets power users and researchers seeking fine-grained control over speech synthesis quality and workflow automation. The project extends the original Chatterbox-TTS with multi-file input, custom candidate generation, advanced audio post-processing, and a persistent UI.

How It Works

The pipeline leverages a modular approach, allowing users to control various stages of TTS generation. Key features include multi-file input with flexible output naming, reference audio conditioning for voice style imitation, and extensive text preprocessing for common issues like abbreviations and numbers. Audio post-processing integrates tools like Auto-Editor for silence trimming and FFmpeg for loudness normalization (EBU R128/peak). A novel Whisper/faster-whisper backend is used for validation, comparing generated audio transcripts against intended text to ensure quality and reduce artifacts. The system supports parallel processing for faster batch jobs and offers a persistent Gradio UI for managing settings and previewing outputs.

Quick Start & Requirements

Primary install / run command: python Chatter.py
Non-default prerequisites and dependencies: Python 3.10.x, FFMPEG.
Links to official quick-start, docs, demo, or other relevant pages: Installation instructions are provided in the README.

Highlighted Details

Supports multi-file input and batch output with customizable naming conventions.
Integrates Whisper/faster-whisper for transcript validation and artifact detection.
Offers a full-featured Gradio UI with persistent settings and audio preview.
Includes advanced audio post-processing with Auto-Editor and FFmpeg normalization.
Features a dedicated Voice Conversion (VC) tab for voice style transfer.

Maintenance & Community

The project is maintained by petermg. Feedback and contributions are welcomed via GitHub issues and pull requests.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires specific Python versions (3.10.x) and FFMPEG, which may need manual PATH configuration. The absence of an explicit license could pose compatibility issues for commercial applications.

Chatterbox-TTS-Extended by petermg

Explore Similar Projects

whispering-ui by Sharrnah

csm-mlx by senstella

Auralis by astramind-ai

WhisperS2T by shashikg

Speech-Translate by Dadangdut33

speech-to-text by reriiasu

xtts-api-server by daswer123

mini-omni by gpt-omni

seed-vc by Plachtaa

WhisperLive by collabora

piper by rhasspy

tortoise-tts by neonbjb