ComfyUI-Index-TTS by chenpipi0807

High-quality text-to-speech in ComfyUI

Created 10 months ago

642 stars

Top 51.7% on SourcePulse

Project Summary

This repository provides custom ComfyUI nodes for high-quality text-to-speech (TTS) using the IndexTTS model. It targets users of ComfyUI, particularly those interested in voice cloning and generating speech in both Chinese and English, offering a streamlined workflow for creative applications.

How It Works

The nodes integrate the IndexTTS model, enabling voice cloning by analyzing a reference audio sample to replicate its characteristics. It supports both Chinese and English text, with features for adjusting speech speed and various synthesis parameters. The project also includes a novel "Novel Text Structure Node" designed to parse narrative text into multi-character dialogue formats, facilitating the creation of audiobooks or multi-voice narratives.

Quick Start & Requirements

Installation: Clone the repository into ComfyUI's custom_nodes directory and install dependencies using .\python_embeded\python.exe -m pip install -r requirements.txt.
Models: Download Index-TTS or IndexTTS-1.5 model files from Hugging Face or Modao and place them in ComfyUI/models/Index-TTS or ComfyUI/models/IndexTTS-1.5 respectively.
Dependencies: Python, PyTorch. CUDA is recommended for GPU acceleration.
Documentation: Workflow example is provided.

Highlighted Details

Supports voice cloning from reference audio.
Includes a "Novel Text Structure Node" for parsing multi-character narrative text.
Offers an "Audio Cleaner" node for denoising and de-reverberating output audio.
Optimized for Windows, with no additional dependencies required.
Supports switching between Index-TTS and IndexTTS-1.5 models.

Maintenance & Community

The project is actively updated, with recent changes focusing on text parsing, model compatibility, and audio processing enhancements. Links to community support or discussion channels are not explicitly provided in the README.

Licensing & Compatibility

The licensing is stated to refer to the original IndexTTS project. Users should verify compatibility for commercial use.

Limitations & Caveats

The novel text parsing algorithm is not perfect and may misidentify characters in complex narrative structures. Compatibility issues with PyTorch 2.7 are noted, with a workaround provided by downgrading the transformers library.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

15 stars in the last 30 days