TTS-Audio-Suite  by diodiogod

ComfyUI extension for advanced Text-to-Speech and Voice Conversion

Created 3 months ago
375 stars

Top 75.7% on SourcePulse

GitHubView on GitHub
Project Summary

This ComfyUI custom node suite provides a unified, multi-engine platform for advanced Text-to-Speech (TTS) and Voice Conversion (VC). It integrates numerous state-of-the-art engines, offering flexible, high-quality audio generation and manipulation capabilities for users within the ComfyUI ecosystem.

How It Works

The suite features a modular, universal streaming architecture, enabling seamless integration of diverse TTS (Chatterbox, F5-TTS, Higgs Audio 2, VibeVoice, IndexTTS-2) and VC (RVC, ChatterBox VC) engines. Key functionalities include advanced text processing with character/language switching, SRT timing synchronization, voice cloning, and experimental silent video analysis.

Quick Start & Requirements

  • Primary install / run command: Recommended via ComfyUI Manager for one-click installation and dependency management. Manual installation involves cloning the repository and running an installer script or pip install -r requirements.txt.
  • Non-default prerequisites and dependencies: ComfyUI installation, Python 3.12+, and specific system libraries for Linux/macOS (e.g., portaudio). Windows requires no additional system dependencies. Many engine models auto-download on first use; manual downloads are also supported.

Highlighted Details

  • Extensive Engine Support: Integrates Chatterbox (23-lang official), F5-TTS, Higgs Audio 2 (high-fidelity cloning), VibeVoice (long-form generation), IndexTTS-2 (emotion control), and RVC (real-time VC).
  • Advanced Features: Supports SRT timing, character/language switching syntax, pause tags, unlimited text length with smart chunking, and iterative voice conversion refinement.
  • Voice Cloning: Higgs Audio 2 enables cloning from short audio samples.
  • Emotion Control: IndexTTS-2 offers dynamic emotion expression via text analysis or direct control.

Maintenance & Community

Developed by diodiogod, evolving from ShmuelRonen's ChatterBox Voice project. Credits ResembleAI for ChatterboxTTS. The GitHub repository serves as the primary community hub.

Licensing & Compatibility

Released under the MIT License, permitting broad usage, including commercial applications. It functions as a ComfyUI custom node, with noted compatibility adjustments for Python 3.13.

Limitations & Caveats

The Silent Speech Analyzer and OpenSeeFace provider for mouth movement detection are experimental. F5-TTS requires adherence to specific best practices for optimal generation quality and to avoid inference failures. Large model downloads are necessary for many engines.

Health Check
Last Commit

20 hours ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
30
Star History
79 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.