TTS pipeline for advanced speech synthesis
Top 82.4% on sourcepulse
Chatterbox-TTS-Extended is a comprehensive Text-to-Speech (TTS) pipeline designed for advanced single and batch audio generation, voice conversion, and artifact reduction. It targets power users and researchers seeking fine-grained control over speech synthesis quality and workflow automation. The project extends the original Chatterbox-TTS with multi-file input, custom candidate generation, advanced audio post-processing, and a persistent UI.
How It Works
The pipeline leverages a modular approach, allowing users to control various stages of TTS generation. Key features include multi-file input with flexible output naming, reference audio conditioning for voice style imitation, and extensive text preprocessing for common issues like abbreviations and numbers. Audio post-processing integrates tools like Auto-Editor for silence trimming and FFmpeg for loudness normalization (EBU R128/peak). A novel Whisper/faster-whisper backend is used for validation, comparing generated audio transcripts against intended text to ensure quality and reduce artifacts. The system supports parallel processing for faster batch jobs and offers a persistent Gradio UI for managing settings and previewing outputs.
Quick Start & Requirements
python Chatter.py
Highlighted Details
Maintenance & Community
The project is maintained by petermg. Feedback and contributions are welcomed via GitHub issues and pull requests.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project requires specific Python versions (3.10.x) and FFMPEG, which may need manual PATH configuration. The absence of an explicit license could pose compatibility issues for commercial applications.
1 day ago
Inactive