ComfyUI-F5-TTS  by niknah

Text-to-speech voice cloning and generation for ComfyUI

Created 1 year ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

This ComfyUI node integrates the F5-Text-To-Speech (TTS) model, enabling users to generate speech from text, with a particular focus on cloning custom voices using provided audio samples. It targets ComfyUI users seeking advanced, personalized TTS capabilities within a node-based workflow, offering near real-time voice cloning and multi-voice synthesis.

How It Works

The node leverages the F5-TTS engine, requiring users to supply a .wav audio sample and its corresponding .txt transcription for voice cloning. It supports loading custom F5-TTS models and language vocabs by placing them in the models/checkpoints/F5-TTS directory. Advanced features include multi-voice output controlled via prompt tags (e.g., {main}, {deep}) and multi-voice input synthesis. Integration with specific models like BigVGAN necessitates a minor code modification within the repository's dependencies.

Quick Start & Requirements

  • Installation: Recommended via ComfyUI-manager. Alternatively, clone the repository into custom_nodes/ComfyUI-F5-TTS, run git submodule update --init --recursive, and pip install -r requirements.txt. A troubleshooting guide is provided for submodule issues.
  • Prerequisites: ffmpeg-full-shared (installable via conda install ffmpeg or system package managers), ComfyUI, Python environment, and necessary Python packages listed in requirements.txt.
  • Links: F5-TTS repository: https://github.com/SWivid/F5-TTS.

Highlighted Details

  • Enables near real-time cloning of user's own voice.
  • Supports multi-voice output generation using distinct prompt tags.
  • Extensive support for numerous languages and custom F5-TTS models.
  • Includes an advanced node with Time-domain harmonic scaling (TDHS).
  • F5-TTSv1 is set as the default model.

Maintenance & Community

The project shows signs of active development through version updates (e.g., 1.0.23). Installation via ComfyUI-manager is recommended for streamlined updates. No specific community channels (Discord, Slack) or notable contributors/sponsorships are mentioned in the provided text.

Licensing & Compatibility

The provided README does not specify a software license. This lack of information prevents an assessment of its compatibility for commercial use or integration into closed-source projects.

Limitations & Caveats

Input voice samples are limited to 15 seconds, and audio cutoff may occur mid-word. Some custom F5-TTS languages listed have not been tested by the author. Integration with certain models like BigVGAN requires manual code adjustments. Git submodule handling can be problematic, potentially requiring manual repository cloning.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.