Discover and explore top open-source AI tools and projects—updated daily.
flybirdxxAdvanced ComfyUI nodes for speech synthesis and voice AI
New!
Top 50.7% on SourcePulse
Summary
This project provides ComfyUI custom nodes for advanced speech synthesis, zero-shot voice cloning, and voice design, leveraging Alibaba's Qwen3-TTS models. It targets ComfyUI users seeking a node-based workflow for high-quality audio generation, enabling custom voice creation and realistic speech synthesis.
How It Works
The integration brings Qwen3-TTS capabilities into ComfyUI via specialized nodes for TTS, zero-shot voice cloning from short audio, and voice design from natural language descriptions. It supports efficient inference with 12Hz/25Hz tokenizers, features on-demand model loading with global caching, and allows selection from multiple attention mechanisms (sage_attn, flash_attn, sdpa, eager) with auto-detection and fallback. An optional model unloading feature manages GPU memory for limited VRAM users.
Quick Start & Requirements
pip install torch torchaudio transformers librosa accelerate. Optional performance attention mechanisms (sage_attn, flash_attn) require separate installation.Highlighted Details
unload_model_after_generate option for VRAM management.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
unload_model_after_generate, potentially impacting generation speed if models are frequently swapped.sage_attn, flash_attn); otherwise, slower built-in options are used.2 days ago
Inactive
metavoiceio