ComfyUI-GPT_SoVITS  by AIFSH

Voice cloning and TTS integrated into ComfyUI

Created 2 years ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository provides a custom node for ComfyUI, integrating the GPT-SoVITS model to enable voice cloning and text-to-speech (TTS) capabilities directly within a visual workflow. It targets ComfyUI users, AI researchers, and content creators seeking to perform advanced audio synthesis and manipulation through a node-based interface, simplifying complex AI audio tasks.

How It Works

This custom node integrates the GPT-SoVITS model into the ComfyUI ecosystem, enabling voice cloning and text-to-speech (TTS) functionalities. Its design focuses on providing a visual, node-based interface for these complex AI audio tasks. Key features include support for SRT subtitle files, facilitating multi-speaker inference and fine-tuning, thereby simplifying advanced audio manipulation within a familiar workflow.

Quick Start & Requirements

  • Installation: Clone the repository, navigate to the directory, and run pip install -r requirements.txt.
  • Prerequisites: ffmpeg must be installed and accessible via the command line (Linux: apt install ffmpeg; Windows: WingetUI). Automatic weight downloads from Huggingface are standard, with mirror options provided for specific regions.
  • Hardware: A Windows standalone build is available, requiring Nvidia GPUs and CUDA >= 11.8.
  • Links: Repository: https://github.com/AIFSH/ComfyUI-GPT_SoVITS.

Highlighted Details

  • Full integration of GPT-SoVITS for voice cloning and TTS within ComfyUI.
  • Support for SRT files enables multi-speaker fine-tuning and inference.
  • A portable standalone build is offered for Windows users with Nvidia GPUs.

Maintenance & Community

  • Community: Mentions a "WeChat Group" for community interaction.
  • Contributors/Sponsorships: Not explicitly detailed in the provided README.

Licensing & Compatibility

  • License: No explicit open-source license is declared in the README.
  • Compatibility: A disclaimer warns against illegal usage and mandates adherence to DMCA and local laws, suggesting potential restrictions on use cases. Commercial use implications are unclear due to the lack of a defined license.

Limitations & Caveats

The absence of a formal license poses adoption risks regarding redistribution and commercial use. A strong disclaimer places full responsibility on users for legal compliance (DMCA, etc.), highlighting potential misuse concerns. Automatic Huggingface model downloads may require manual configuration or mirroring in certain network environments.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.