ComfyUI_IndexTTS by billwuhao

High-fidelity voice cloning and dialogue generation

Created 10 months ago

506 stars

Top 61.7% on SourcePulse

Project Summary

This ComfyUI custom node integrates IndexTTS, a high-quality, fast voice cloning and synthesis system. It targets users within the ComfyUI ecosystem seeking advanced text-to-speech capabilities, including realistic voice cloning, multi-language support (Chinese/English), and nuanced emotional expression, with a notable feature for generating two-person dialogues.

How It Works

The node leverages the IndexTTS model architecture, which combines advanced techniques for voice cloning and speech synthesis. Key advantages include its ability to capture and replicate custom voice timbres with high fidelity and speed. It supports fine-grained control over emotional expression through audio prompts, text prompts, and vector manipulation, enabling dynamic and natural-sounding speech generation, including complex multi-speaker interactions.

Quick Start & Requirements

Installation: Clone the repository into your ComfyUI custom nodes directory:
```
cd ComfyUI/custom_nodes
git clone https://github.com/billwuhao/ComfyUI_IndexTTS.git
cd ComfyUI_IndexTTS
pip install -r requirements.txt
```
Windows users require specific pynini installation steps using provided wheel files and WeTextProcessing.
Prerequisites: ComfyUI, Python, and dependencies listed in requirements.txt. GPU is highly recommended for performance.
Models: Requires manual download of specific model files (e.g., bigvgan_generator.pth, bpe.model, gpt.pth for v1.5; various components from Hugging Face for v2) and placement into the ComfyUI/models/TTS/Index-TTS directory.
Documentation: Links to model downloads are provided within the README.

Highlighted Details

Supports high-fidelity voice cloning and fast speech generation.
Enables custom timbre selection and unlimited emotional expression control.
Features robust support for two-person dialogue generation.
Recent updates include support for IndexTTS2 (v2.0.0) and improved Windows compatibility.

Maintenance & Community

No specific community links (Discord, Slack) or detailed maintenance information (contributors, roadmap) are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided text. Compatibility is within the ComfyUI framework.

Limitations & Caveats

DeepSpeed acceleration is noted as providing minimal performance gains. The first run automatically builds custom CUDA kernels, which may add initial setup time. Model file management requires manual steps.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days