Discover and explore top open-source AI tools and projects—updated daily.
diodiogodComfyUI extension for advanced Text-to-Speech and Voice Conversion
Top 75.7% on SourcePulse
This ComfyUI custom node suite provides a unified, multi-engine platform for advanced Text-to-Speech (TTS) and Voice Conversion (VC). It integrates numerous state-of-the-art engines, offering flexible, high-quality audio generation and manipulation capabilities for users within the ComfyUI ecosystem.
How It Works
The suite features a modular, universal streaming architecture, enabling seamless integration of diverse TTS (Chatterbox, F5-TTS, Higgs Audio 2, VibeVoice, IndexTTS-2) and VC (RVC, ChatterBox VC) engines. Key functionalities include advanced text processing with character/language switching, SRT timing synchronization, voice cloning, and experimental silent video analysis.
Quick Start & Requirements
pip install -r requirements.txt.portaudio). Windows requires no additional system dependencies. Many engine models auto-download on first use; manual downloads are also supported.Highlighted Details
Maintenance & Community
Developed by diodiogod, evolving from ShmuelRonen's ChatterBox Voice project. Credits ResembleAI for ChatterboxTTS. The GitHub repository serves as the primary community hub.
Licensing & Compatibility
Released under the MIT License, permitting broad usage, including commercial applications. It functions as a ComfyUI custom node, with noted compatibility adjustments for Python 3.13.
Limitations & Caveats
The Silent Speech Analyzer and OpenSeeFace provider for mouth movement detection are experimental. F5-TTS requires adherence to specific best practices for optimal generation quality and to avoid inference failures. Large model downloads are necessary for many engines.
20 hours ago
Inactive