Ultimate-TTS-Studio-SUP3R-Edition by SUP3RMASS1VE

All-in-one Text-to-Speech studio

Created 1 year ago

268 stars

Top 95.5% on SourcePulse

Project Summary

Summary

This project offers an NVIDIA-only, all-in-one Text-to-Speech (TTS) studio integrating multiple advanced engines (Kokoro, KittenTTS, Higgs, Chatterbox, Fish-Speech, F5, Index-TTS, IndexTTS2, VibeVoice) into a unified Gradio interface. It targets users needing versatile speech synthesis, providing features like conversation mode, eBook-to-audiobook conversion, and custom voice cloning, consolidating diverse TTS capabilities into a streamlined application.

How It Works

The studio unifies several TTS engines within a Gradio application, supporting reference audio cloning, multilingual voices, and real-time synthesis. It enables professional audio effects (reverb, echo, EQ, pitch shift, gain) and allows manual model loading/unloading for precise GPU memory control.

Quick Start & Requirements

Installation: One-click via Pinokio/Dione, automated installer script (RUN_INSTALLER), or manual Conda setup.
Prerequisites: NVIDIA GPU (tested on Windows 11 RTX 4090), CUDA >= 12.8, Conda, Python 3.10, Hugging Face account/token (for Fish Speech models), pynini, wetextprocessing, espeak-ng.
Links: Conda: https://docs.conda.io/en/latest/miniconda.html, Repo: https://github.com/SUP3RMASS1VE/Ultimate-TTS-Studio-SUP3R-Edition.git, Launch: python launch.py.

Highlighted Details

Integrates Kokoro, KittenTTS, Higgs, Chatterbox, Fish-Speech, F5, Index-TTS, IndexTTS2, VibeVoice engines.
Features: Unified interface, Kokoro conversation mode, eBook-to-audiobook conversion, custom voice cloning (Chatterbox, Kokoro), reference audio cloning, multilingual voices, professional audio effects.
UI optimized for dark mode; manual model management for memory control.

Maintenance & Community

Actively updated with recent additions in Sept 2025. Development is supported via user donations (PayPal, Bitcoin). No specific community channels or roadmap are detailed.

Licensing & Compatibility

Primary license is MIT. Dependencies include MIT and Apache 2.0 licenses. MIT generally permits commercial use. However, the project is strictly NVIDIA-only and tested only on Windows 11, with other platforms not guaranteed.

Limitations & Caveats

Strictly limited to NVIDIA GPUs and Windows 11; other OS/hardware compatibility is not guaranteed. Fish Speech may produce loud/muffled audio, requiring volume caution. Fish Speech models require manual download and Hugging Face authentication.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days