Discover and explore top open-source AI tools and projects—updated daily.
FranckyBGradio web UI for advanced voice cloning and design
New!
Top 92.4% on SourcePulse
This project provides a Gradio-based web UI for advanced voice cloning and voice design, leveraging Qwen3-TTS and VibeVoice for speech synthesis and Whisper or VibeVoice-ASR for transcription. It targets engineers and researchers seeking a flexible tool for generating custom speech, creating multi-speaker dialogues, and exploring voice design from natural language descriptions, offering a powerful yet accessible platform for synthetic media creation.
How It Works
Voice Clone Studio integrates multiple state-of-the-art models within a user-friendly Gradio interface. It utilizes Qwen3-TTS for generating speech from text, offering both fast preset voices and advanced voice design capabilities based on descriptive prompts. For higher quality and longer audio generation, VibeVoice is incorporated, supporting custom voice cloning from user-provided samples and up to 90 minutes of continuous speech. Automatic speech recognition is handled by either OpenAI's Whisper or VibeVoice-ASR, ensuring seamless transcription of reference audio. Key features include voice prompt caching for faster subsequent generations and seed control for reproducible outputs.
Quick Start & Requirements
Installation involves cloning the repository (git clone https://github.com/FranckyB/Voice-Clone-Studio.git) and running a setup script (setup.bat on Windows) or manual setup. Prerequisites include Python 3.12+, a CUDA-compatible GPU (8GB+ VRAM recommended), SOX, and FFMPEG. The UI is launched via python voice_clone_studio.py or launch.bat. Optional Flash Attention 2 can improve performance.
Highlighted Details
Maintenance & Community
The provided README does not contain specific details regarding notable contributors, sponsorships, community channels (like Discord or Slack), or a public roadmap.
Licensing & Compatibility
The project is licensed under the Apache License 2.0. Its core components also use the Apache 2.0 license (Qwen3-TTS, Gradio) and the MIT license (VibeVoice, Whisper). These permissive licenses generally allow for commercial use and integration into closed-source projects.
Limitations & Caveats
The VibeVoice engine may spontaneously add background music or sounds for realism, and it does not support style instructions. A CUDA-compatible GPU with sufficient VRAM is essential for optimal performance, particularly with larger models. External dependencies like SOX and FFMPEG must be installed separately.
23 hours ago
Inactive