Discover and explore top open-source AI tools and projects—updated daily.
Speech synthesis and voice cloning node for ComfyUI
New!
Top 90.3% on SourcePulse
Summary
ComfyUI-VoxCPM integrates VoxCPM, a novel tokenizer-free Text-to-Speech (TTS) system, into the ComfyUI workflow. It enables highly expressive speech generation and true-to-life zero-shot voice cloning, offering advanced audio synthesis capabilities for researchers and power users. The node automates model management and provides fine-grained control over audio output.
How It Works
VoxCPM models speech in a continuous space using the MiniCPM-4 backbone, enabling context-aware prosody and emotional tone generation without traditional tokenization. This approach facilitates accurate voice cloning from short audio samples and high-quality zero-shot TTS. The ComfyUI node streamlines integration by handling automatic model downloads, VRAM management, and audio processing, allowing direct generation from text and optional reference audio.
Quick Start & Requirements
ComfyUI/custom_nodes/
and run pip install -r requirements.txt
.requirements.txt
). Models are automatically downloaded to ComfyUI/models/tts/VoxCPM/
on first use.Highlighted Details
cfg_value
and inference_timesteps
, and offers phoneme input for precise pronunciation.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Potential for misuse exists due to powerful voice cloning capabilities, requiring users to adhere to ethical and legal standards. The model may exhibit instability with very long or complex input texts. Primarily trained on Chinese and English; performance on other languages is not guaranteed. The node's built-in denoiser (ZipEnhancer) has been removed to align with ComfyUI's modular philosophy.
3 weeks ago
Inactive