ComfyUI-Easy-IndexTTS2 by yolain

Advanced Text-to-Speech generation for ComfyUI

Created 9 months ago

259 stars

Top 97.7% on SourcePulse

Project Summary

This repository provides a ComfyUI custom node that enhances the IndexTTS2 text-to-speech model. It offers a streamlined workflow, advanced voice cloning capabilities, and integrated model management, targeting ComfyUI users seeking greater control and flexibility in TTS generation. The primary benefit is an improved user experience and expanded functionality over the base IndexTTS2 model within the ComfyUI ecosystem.

How It Works

This project is a modified version of ComfyUI_Index_TTS, built upon the IndexTTS2 architecture. While retaining the core logic of the original IndexTTS model, it introduces a significantly adjusted usage flow and new nodes. Key architectural choices include adapting to the ComfyUI v3 node paradigm for better integration and developing specific nodes for managing models (downloading from HuggingFace/ModelScope, unloading) and advanced voice/emotion referencing (using descriptions, audio samples, or vectors).

Quick Start & Requirements

Installation:

Clone the repository into your ComfyUI custom nodes directory:

cd ComfyUI/custom_nodes
git clone https://github.com/yolain/ComfyUI-Easy-IndexTTS2

Install dependencies:

cd ComfyUI-Easy-IndexTTS2
../../python_embeded/python.exe -m pip install -r requirements.txt

Prerequisites:
- ComfyUI (updated to a recent version supporting v3 nodes).
- Python environment.
- Specific model files must be placed in designated paths within ComfyUI/models/IndexTTS-2/ or will be auto-downloaded to ./ComfyUI/models/IndexTTS-2/hf_cache/.
Models: Requires downloading several model components, including:
- semantic_codec/model.safetensors from https://huggingface.co/amphion/MaskGCT/tree/main/semantic_codec
- campplus_cn_common.bin from https://huggingface.co/funasr/campplus
- w2v-bert-2.0/ folder from https://huggingface.co/facebook/w2v-bert-2.0
- BigVGAN vocoder models (e.g., nvidia/bigvgan_v2_22khz_80band_256x) into bigvgan/.
- Other direct read files (gpt.pth, s2mel.pth, bpe.model, wav2vec2bert_stats.pt, qwen0.6bemo4-merge/) and the base IndexTTS-2 model.
Setup Time: Varies significantly based on download speeds and model sizes.

Highlighted Details

Nodes for downloading and loading models from HuggingFace or ModelScope.
Model unloading functionality.
Advanced voice cloning via reference audio, description, or emotion vectors.
Support for adding pauses between dialogue segments (e.g., -0.5s-).
Timed text segments using bracket notation (start, end) for subtitle alignment.
Compatibility with ComfyUI v3 node paradigms.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), sponsorships, or roadmap are provided in the README.

Licensing & Compatibility

The project is provided "as is" with no explicit warranties. A disclaimer states the author and copyright holders are not liable for any claims, damages, or responsibilities arising from its use. It strictly prohibits illegal use and copyright infringement, placing responsibility on the user to comply with all applicable laws and regulations. No specific open-source license (e.g., MIT, Apache) is mentioned, and compatibility for commercial use or closed-source linking is not addressed.

Limitations & Caveats

The project includes a broad disclaimer of liability, making users solely responsible for legal compliance and any issues arising from usage. It requires a recent ComfyUI version to function correctly due to its reliance on v3 node paradigms. The extensive list of required model files and their specific placement can be complex to set up.

Health Check

Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days