Chatterbox-TTS-Extended  by petermg

TTS pipeline for advanced speech synthesis

created 2 months ago
339 stars

Top 82.4% on sourcepulse

GitHubView on GitHub
Project Summary

Chatterbox-TTS-Extended is a comprehensive Text-to-Speech (TTS) pipeline designed for advanced single and batch audio generation, voice conversion, and artifact reduction. It targets power users and researchers seeking fine-grained control over speech synthesis quality and workflow automation. The project extends the original Chatterbox-TTS with multi-file input, custom candidate generation, advanced audio post-processing, and a persistent UI.

How It Works

The pipeline leverages a modular approach, allowing users to control various stages of TTS generation. Key features include multi-file input with flexible output naming, reference audio conditioning for voice style imitation, and extensive text preprocessing for common issues like abbreviations and numbers. Audio post-processing integrates tools like Auto-Editor for silence trimming and FFmpeg for loudness normalization (EBU R128/peak). A novel Whisper/faster-whisper backend is used for validation, comparing generated audio transcripts against intended text to ensure quality and reduce artifacts. The system supports parallel processing for faster batch jobs and offers a persistent Gradio UI for managing settings and previewing outputs.

Quick Start & Requirements

  • Primary install / run command: python Chatter.py
  • Non-default prerequisites and dependencies: Python 3.10.x, FFMPEG.
  • Links to official quick-start, docs, demo, or other relevant pages: Installation instructions are provided in the README.

Highlighted Details

  • Supports multi-file input and batch output with customizable naming conventions.
  • Integrates Whisper/faster-whisper for transcript validation and artifact detection.
  • Offers a full-featured Gradio UI with persistent settings and audio preview.
  • Includes advanced audio post-processing with Auto-Editor and FFmpeg normalization.
  • Features a dedicated Voice Conversion (VC) tab for voice style transfer.

Maintenance & Community

The project is maintained by petermg. Feedback and contributions are welcomed via GitHub issues and pull requests.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires specific Python versions (3.10.x) and FFMPEG, which may need manual PATH configuration. The absence of an explicit license could pose compatibility issues for commercial applications.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
8
Star History
340 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.