ComfyUI-Index-TTS  by chenpipi0807

High-quality text-to-speech in ComfyUI

Created 6 months ago
496 stars

Top 62.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides custom ComfyUI nodes for high-quality text-to-speech (TTS) using the IndexTTS model. It targets users of ComfyUI, particularly those interested in voice cloning and generating speech in both Chinese and English, offering a streamlined workflow for creative applications.

How It Works

The nodes integrate the IndexTTS model, enabling voice cloning by analyzing a reference audio sample to replicate its characteristics. It supports both Chinese and English text, with features for adjusting speech speed and various synthesis parameters. The project also includes a novel "Novel Text Structure Node" designed to parse narrative text into multi-character dialogue formats, facilitating the creation of audiobooks or multi-voice narratives.

Quick Start & Requirements

  • Installation: Clone the repository into ComfyUI's custom_nodes directory and install dependencies using .\python_embeded\python.exe -m pip install -r requirements.txt.
  • Models: Download Index-TTS or IndexTTS-1.5 model files from Hugging Face or Modao and place them in ComfyUI/models/Index-TTS or ComfyUI/models/IndexTTS-1.5 respectively.
  • Dependencies: Python, PyTorch. CUDA is recommended for GPU acceleration.
  • Documentation: Workflow example is provided.

Highlighted Details

  • Supports voice cloning from reference audio.
  • Includes a "Novel Text Structure Node" for parsing multi-character narrative text.
  • Offers an "Audio Cleaner" node for denoising and de-reverberating output audio.
  • Optimized for Windows, with no additional dependencies required.
  • Supports switching between Index-TTS and IndexTTS-1.5 models.

Maintenance & Community

The project is actively updated, with recent changes focusing on text parsing, model compatibility, and audio processing enhancements. Links to community support or discussion channels are not explicitly provided in the README.

Licensing & Compatibility

The licensing is stated to refer to the original IndexTTS project. Users should verify compatibility for commercial use.

Limitations & Caveats

The novel text parsing algorithm is not perfect and may misidentify characters in complex narrative structures. Compatibility issues with PyTorch 2.7 are noted, with a workaround provided by downgrading the transformers library.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
11
Star History
58 stars in the last 30 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.3%
52k
Few-shot voice cloning and TTS web UI
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.