ComfyUI-Easy-IndexTTS2  by yolain

Advanced Text-to-Speech generation for ComfyUI

Created 8 months ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a ComfyUI custom node that enhances the IndexTTS2 text-to-speech model. It offers a streamlined workflow, advanced voice cloning capabilities, and integrated model management, targeting ComfyUI users seeking greater control and flexibility in TTS generation. The primary benefit is an improved user experience and expanded functionality over the base IndexTTS2 model within the ComfyUI ecosystem.

How It Works

This project is a modified version of ComfyUI_Index_TTS, built upon the IndexTTS2 architecture. While retaining the core logic of the original IndexTTS model, it introduces a significantly adjusted usage flow and new nodes. Key architectural choices include adapting to the ComfyUI v3 node paradigm for better integration and developing specific nodes for managing models (downloading from HuggingFace/ModelScope, unloading) and advanced voice/emotion referencing (using descriptions, audio samples, or vectors).

Quick Start & Requirements

  • Installation:
    1. Clone the repository into your ComfyUI custom nodes directory:
      cd ComfyUI/custom_nodes
      git clone https://github.com/yolain/ComfyUI-Easy-IndexTTS2
      
    2. Install dependencies:
      cd ComfyUI-Easy-IndexTTS2
      ../../python_embeded/python.exe -m pip install -r requirements.txt
      
  • Prerequisites:
    • ComfyUI (updated to a recent version supporting v3 nodes).
    • Python environment.
    • Specific model files must be placed in designated paths within ComfyUI/models/IndexTTS-2/ or will be auto-downloaded to ./ComfyUI/models/IndexTTS-2/hf_cache/.
  • Models: Requires downloading several model components, including:
  • Setup Time: Varies significantly based on download speeds and model sizes.

Highlighted Details

  • Nodes for downloading and loading models from HuggingFace or ModelScope.
  • Model unloading functionality.
  • Advanced voice cloning via reference audio, description, or emotion vectors.
  • Support for adding pauses between dialogue segments (e.g., -0.5s-).
  • Timed text segments using bracket notation (start, end) for subtitle alignment.
  • Compatibility with ComfyUI v3 node paradigms.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), sponsorships, or roadmap are provided in the README.

Licensing & Compatibility

The project is provided "as is" with no explicit warranties. A disclaimer states the author and copyright holders are not liable for any claims, damages, or responsibilities arising from its use. It strictly prohibits illegal use and copyright infringement, placing responsibility on the user to comply with all applicable laws and regulations. No specific open-source license (e.g., MIT, Apache) is mentioned, and compatibility for commercial use or closed-source linking is not addressed.

Limitations & Caveats

The project includes a broad disclaimer of liability, making users solely responsible for legal compliance and any issues arising from usage. It requires a recent ComfyUI version to function correctly due to its reliance on v3 node paradigms. The extensive list of required model files and their specific placement can be complex to set up.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

Orpheus-TTS by canopyai

0.1%
6k
Open-source TTS for human-sounding speech, built on Llama-3b
Created 1 year ago
Updated 5 months ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.4%
58k
Few-shot voice cloning and TTS web UI
Created 2 years ago
Updated 3 weeks ago
Feedback? Help us improve.