Web UI for Whisper-based subtitle generation
Top 20.9% on sourcepulse
This project provides a Gradio-based web UI for the Whisper speech-to-text model, enabling easy subtitle generation from various sources like files, YouTube, and microphones. It targets users needing efficient and versatile subtitle creation, offering features like speech-to-text translation and subtitle file translation.
How It Works
The UI integrates multiple Whisper implementations, defaulting to SYSTRAN/faster-whisper for optimized VRAM usage and speed. It supports pre-processing audio with Silero VAD, BGM separation with UVR, and post-processing with pyannote for speaker diarization. Translation capabilities are extended via Facebook NLLB models and the DeepL API.
Quick Start & Requirements
install.bat
or install.sh
to set up dependencies in a virtual environment. Run start-webui.bat
or start-webui.sh
to launch.docker compose build
) and run (docker compose up
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
--extra-index-url
in requirements.txt
for non-Nvidia GPUs or different CUDA versions.1 week ago
1 day