voice-pro by abus-aikorea

WebUI for speech recognition, translation, and dubbing

Created 1 year ago

11,133 stars

Top 4.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Abubakar Abid

Cofounder of Gradio

Project Summary

Voice-Pro is a comprehensive Gradio WebUI for AI-powered multimedia content creation, targeting creators, researchers, and multilingual professionals. It streamlines workflows by integrating YouTube downloading, voice separation, speech recognition, translation, and text-to-speech (TTS) capabilities into a single platform, offering a robust alternative to specialized tools.

How It Works

The application leverages a modular architecture, integrating multiple state-of-the-art AI models for each function. For speech recognition, it supports Whisper, Faster-Whisper, Whisper-Timestamped, and WhisperX. Zero-shot voice cloning is handled by F5-TTS, E2-TTS, and CosyVoice. TTS capabilities are provided by Edge-TTS and kokoro, with paid tiers offering Azure TTS. Audio processing includes YouTube downloading via yt-dlp and vocal isolation with Demucs. Translation for over 100 languages is facilitated by Deep-Translator, with Azure Translator available in paid versions.

Quick Start & Requirements

Install: Clone the repository and run configure.bat followed by start.bat.
Prerequisites: Windows 10/11 (64-bit), NVIDIA GPU with CUDA 12.4 (recommended), 4GB+ VRAM (8GB+ preferred), 4GB+ RAM, 20GB+ storage, internet connection. Linux/Mac are unsupported.
Setup Time: Initial setup and dependency downloads can take over 1 hour.
Links: YouTube Showcase Demo, Shopify (Paid Version)

Highlighted Details

Supports zero-shot voice cloning with F5-TTS, E2-TTS, and CosyVoice.
Integrates multiple Whisper variants for speech-to-text, including WhisperX.
Offers multilingual TTS with Edge-TTS and kokoro, plus YouTube downloading and vocal isolation.
Free trial limited to 60-second media processing; subscription offers unlimited usage and Azure services.

Maintenance & Community

The project is actively developed by ABUS, a startup from Korea. Contributions are welcomed via Issues and Pull Requests. Contact email for inquiries: abus.aikorea@gmail.com.

Licensing & Compatibility

The repository's core components are open-source (e.g., Demucs, yt-dlp, Gradio, Whisper). Specific TTS and translation services (Azure) are part of a paid subscription. The project itself appears to be freely available for use, with a free trial limitation.

Limitations & Caveats

The free trial is restricted to 60-second media. Linux and macOS are not supported. Upgrading from v2.x to v3.x requires a clean install. The initial setup and model downloads can be time-consuming.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

236 stars in the last 30 days