ComfyUI_IndexTTS  by billwuhao

High-fidelity voice cloning and dialogue generation

Created 4 months ago
292 stars

Top 90.4% on SourcePulse

GitHubView on GitHub
Project Summary

This ComfyUI custom node integrates IndexTTS, a high-quality, fast voice cloning and synthesis system. It targets users within the ComfyUI ecosystem seeking advanced text-to-speech capabilities, including realistic voice cloning, multi-language support (Chinese/English), and nuanced emotional expression, with a notable feature for generating two-person dialogues.

How It Works

The node leverages the IndexTTS model architecture, which combines advanced techniques for voice cloning and speech synthesis. Key advantages include its ability to capture and replicate custom voice timbres with high fidelity and speed. It supports fine-grained control over emotional expression through audio prompts, text prompts, and vector manipulation, enabling dynamic and natural-sounding speech generation, including complex multi-speaker interactions.

Quick Start & Requirements

  • Installation: Clone the repository into your ComfyUI custom nodes directory:
    cd ComfyUI/custom_nodes
    git clone https://github.com/billwuhao/ComfyUI_IndexTTS.git
    cd ComfyUI_IndexTTS
    pip install -r requirements.txt
    
    Windows users require specific pynini installation steps using provided wheel files and WeTextProcessing.
  • Prerequisites: ComfyUI, Python, and dependencies listed in requirements.txt. GPU is highly recommended for performance.
  • Models: Requires manual download of specific model files (e.g., bigvgan_generator.pth, bpe.model, gpt.pth for v1.5; various components from Hugging Face for v2) and placement into the ComfyUI/models/TTS/Index-TTS directory.
  • Documentation: Links to model downloads are provided within the README.

Highlighted Details

  • Supports high-fidelity voice cloning and fast speech generation.
  • Enables custom timbre selection and unlimited emotional expression control.
  • Features robust support for two-person dialogue generation.
  • Recent updates include support for IndexTTS2 (v2.0.0) and improved Windows compatibility.

Maintenance & Community

No specific community links (Discord, Slack) or detailed maintenance information (contributors, roadmap) are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided text. Compatibility is within the ComfyUI framework.

Limitations & Caveats

DeepSpeed acceleration is noted as providing minimal performance gains. The first run automatically builds custom CUDA kernels, which may add initial setup time. Model file management requires manual steps.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
26
Star History
163 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), and
6 more.

OpenVoice by myshell-ai

0.2%
34k
Audio foundation model for versatile, instant voice cloning
Created 1 year ago
Updated 5 months ago
Feedback? Help us improve.