WebUI for speech recognition, translation, and dubbing
Top 11.4% on sourcepulse
Voice-Pro is a comprehensive Gradio WebUI for AI-powered multimedia content creation, targeting creators, researchers, and multilingual professionals. It streamlines workflows by integrating YouTube downloading, voice separation, speech recognition, translation, and text-to-speech (TTS) capabilities into a single platform, offering a robust alternative to specialized tools.
How It Works
The application leverages a modular architecture, integrating multiple state-of-the-art AI models for each function. For speech recognition, it supports Whisper, Faster-Whisper, Whisper-Timestamped, and WhisperX. Zero-shot voice cloning is handled by F5-TTS, E2-TTS, and CosyVoice. TTS capabilities are provided by Edge-TTS and kokoro, with paid tiers offering Azure TTS. Audio processing includes YouTube downloading via yt-dlp and vocal isolation with Demucs. Translation for over 100 languages is facilitated by Deep-Translator, with Azure Translator available in paid versions.
Quick Start & Requirements
configure.bat
followed by start.bat
.Highlighted Details
Maintenance & Community
The project is actively developed by ABUS, a startup from Korea. Contributions are welcomed via Issues and Pull Requests. Contact email for inquiries: abus.aikorea@gmail.com.
Licensing & Compatibility
The repository's core components are open-source (e.g., Demucs, yt-dlp, Gradio, Whisper). Specific TTS and translation services (Azure) are part of a paid subscription. The project itself appears to be freely available for use, with a free trial limitation.
Limitations & Caveats
The free trial is restricted to 60-second media. Linux and macOS are not supported. Upgrading from v2.x to v3.x requires a clean install. The initial setup and model downloads can be time-consuming.
2 weeks ago
1 day