ComfyUI-F5-TTS by niknah

Text-to-speech voice cloning and generation for ComfyUI

Created 1 year ago

257 stars

Top 98.3% on SourcePulse

Project Summary

This ComfyUI node integrates the F5-Text-To-Speech (TTS) model, enabling users to generate speech from text, with a particular focus on cloning custom voices using provided audio samples. It targets ComfyUI users seeking advanced, personalized TTS capabilities within a node-based workflow, offering near real-time voice cloning and multi-voice synthesis.

How It Works

The node leverages the F5-TTS engine, requiring users to supply a .wav audio sample and its corresponding .txt transcription for voice cloning. It supports loading custom F5-TTS models and language vocabs by placing them in the models/checkpoints/F5-TTS directory. Advanced features include multi-voice output controlled via prompt tags (e.g., {main}, {deep}) and multi-voice input synthesis. Integration with specific models like BigVGAN necessitates a minor code modification within the repository's dependencies.

Quick Start & Requirements

Installation: Recommended via ComfyUI-manager. Alternatively, clone the repository into custom_nodes/ComfyUI-F5-TTS, run git submodule update --init --recursive, and pip install -r requirements.txt. A troubleshooting guide is provided for submodule issues.
Prerequisites: ffmpeg-full-shared (installable via conda install ffmpeg or system package managers), ComfyUI, Python environment, and necessary Python packages listed in requirements.txt.
Links: F5-TTS repository: https://github.com/SWivid/F5-TTS.

Highlighted Details

Enables near real-time cloning of user's own voice.
Supports multi-voice output generation using distinct prompt tags.
Extensive support for numerous languages and custom F5-TTS models.
Includes an advanced node with Time-domain harmonic scaling (TDHS).
F5-TTSv1 is set as the default model.

Maintenance & Community

The project shows signs of active development through version updates (e.g., 1.0.23). Installation via ComfyUI-manager is recommended for streamlined updates. No specific community channels (Discord, Slack) or notable contributors/sponsorships are mentioned in the provided text.

Licensing & Compatibility

The provided README does not specify a software license. This lack of information prevents an assessment of its compatibility for commercial use or integration into closed-source projects.

Limitations & Caveats

Input voice samples are limited to 15 seconds, and audio cutoff may occur mid-word. Some custom F5-TTS languages listed have not been tested by the author. Integration with certain models like BigVGAN requires manual code adjustments. Git submodule handling can be problematic, potentially requiring manual repository cloning.

ComfyUI-F5-TTS by niknah

Explore Similar Projects

Voice-Clone-Studio by FranckyB

ComfyUI-VoxCPM by wildminder

ComfyUI_IndexTTS by billwuhao

SonicVale by xcLee001

ComfyUI-VibeVoice by wildminder

ComfyUI-Qwen-TTS by flybirdxx

MOSS-TTSD by OpenMOSS

FireRedTTS2 by FireRedTeam

neutts by neuphonic

KittenTTS by KittenML

Zonos by Zyphra

Qwen3-TTS by QwenLM