Discover and explore top open-source AI tools and projects—updated daily.
niknahText-to-speech voice cloning and generation for ComfyUI
Top 99.8% on SourcePulse
This ComfyUI node integrates the F5-Text-To-Speech (TTS) model, enabling users to generate speech from text, with a particular focus on cloning custom voices using provided audio samples. It targets ComfyUI users seeking advanced, personalized TTS capabilities within a node-based workflow, offering near real-time voice cloning and multi-voice synthesis.
How It Works
The node leverages the F5-TTS engine, requiring users to supply a .wav audio sample and its corresponding .txt transcription for voice cloning. It supports loading custom F5-TTS models and language vocabs by placing them in the models/checkpoints/F5-TTS directory. Advanced features include multi-voice output controlled via prompt tags (e.g., {main}, {deep}) and multi-voice input synthesis. Integration with specific models like BigVGAN necessitates a minor code modification within the repository's dependencies.
Quick Start & Requirements
custom_nodes/ComfyUI-F5-TTS, run git submodule update --init --recursive, and pip install -r requirements.txt. A troubleshooting guide is provided for submodule issues.ffmpeg-full-shared (installable via conda install ffmpeg or system package managers), ComfyUI, Python environment, and necessary Python packages listed in requirements.txt.https://github.com/SWivid/F5-TTS.Highlighted Details
Maintenance & Community
The project shows signs of active development through version updates (e.g., 1.0.23). Installation via ComfyUI-manager is recommended for streamlined updates. No specific community channels (Discord, Slack) or notable contributors/sponsorships are mentioned in the provided text.
Licensing & Compatibility
The provided README does not specify a software license. This lack of information prevents an assessment of its compatibility for commercial use or integration into closed-source projects.
Limitations & Caveats
Input voice samples are limited to 15 seconds, and audio cutoff may occur mid-word. Some custom F5-TTS languages listed have not been tested by the author. Integration with certain models like BigVGAN requires manual code adjustments. Git submodule handling can be problematic, potentially requiring manual repository cloning.
3 weeks ago
Inactive
CorentinJ