Discover and explore top open-source AI tools and projects—updated daily.
jhj0517Web UI for Whisper-based subtitle generation
Top 18.6% on SourcePulse
This project provides a Gradio-based web UI for the Whisper speech-to-text model, enabling easy subtitle generation from various sources like files, YouTube, and microphones. It targets users needing efficient and versatile subtitle creation, offering features like speech-to-text translation and subtitle file translation.
How It Works
The UI integrates multiple Whisper implementations, defaulting to SYSTRAN/faster-whisper for optimized VRAM usage and speed. It supports pre-processing audio with Silero VAD, BGM separation with UVR, and post-processing with pyannote for speaker diarization. Translation capabilities are extended via Facebook NLLB models and the DeepL API.
Quick Start & Requirements
install.bat or install.sh to set up dependencies in a virtual environment. Run start-webui.bat or start-webui.sh to launch.docker compose build) and run (docker compose up).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
--extra-index-url in requirements.txt for non-Nvidia GPUs or different CUDA versions.2 weeks ago
1 day