voice-pro  by abus-aikorea

WebUI for speech recognition, translation, and dubbing

created 1 year ago
4,354 stars

Top 11.4% on sourcepulse

GitHubView on GitHub
Project Summary

Voice-Pro is a comprehensive Gradio WebUI for AI-powered multimedia content creation, targeting creators, researchers, and multilingual professionals. It streamlines workflows by integrating YouTube downloading, voice separation, speech recognition, translation, and text-to-speech (TTS) capabilities into a single platform, offering a robust alternative to specialized tools.

How It Works

The application leverages a modular architecture, integrating multiple state-of-the-art AI models for each function. For speech recognition, it supports Whisper, Faster-Whisper, Whisper-Timestamped, and WhisperX. Zero-shot voice cloning is handled by F5-TTS, E2-TTS, and CosyVoice. TTS capabilities are provided by Edge-TTS and kokoro, with paid tiers offering Azure TTS. Audio processing includes YouTube downloading via yt-dlp and vocal isolation with Demucs. Translation for over 100 languages is facilitated by Deep-Translator, with Azure Translator available in paid versions.

Quick Start & Requirements

  • Install: Clone the repository and run configure.bat followed by start.bat.
  • Prerequisites: Windows 10/11 (64-bit), NVIDIA GPU with CUDA 12.4 (recommended), 4GB+ VRAM (8GB+ preferred), 4GB+ RAM, 20GB+ storage, internet connection. Linux/Mac are unsupported.
  • Setup Time: Initial setup and dependency downloads can take over 1 hour.
  • Links: YouTube Showcase Demo, Shopify (Paid Version)

Highlighted Details

  • Supports zero-shot voice cloning with F5-TTS, E2-TTS, and CosyVoice.
  • Integrates multiple Whisper variants for speech-to-text, including WhisperX.
  • Offers multilingual TTS with Edge-TTS and kokoro, plus YouTube downloading and vocal isolation.
  • Free trial limited to 60-second media processing; subscription offers unlimited usage and Azure services.

Maintenance & Community

The project is actively developed by ABUS, a startup from Korea. Contributions are welcomed via Issues and Pull Requests. Contact email for inquiries: abus.aikorea@gmail.com.

Licensing & Compatibility

The repository's core components are open-source (e.g., Demucs, yt-dlp, Gradio, Whisper). Specific TTS and translation services (Azure) are part of a paid subscription. The project itself appears to be freely available for use, with a free trial limitation.

Limitations & Caveats

The free trial is restricted to 60-second media. Linux and macOS are not supported. Upgrading from v2.x to v3.x requires a clean install. The initial setup and model downloads can be time-consuming.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
753 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

GPT-SoVITS by RVC-Boss

0.5%
49k
Few-shot voice cloning and TTS web UI
created 1 year ago
updated 1 day ago
Feedback? Help us improve.