Whisper-WebUI  by jhj0517

Web UI for Whisper-based subtitle generation

created 2 years ago
2,205 stars

Top 20.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a Gradio-based web UI for the Whisper speech-to-text model, enabling easy subtitle generation from various sources like files, YouTube, and microphones. It targets users needing efficient and versatile subtitle creation, offering features like speech-to-text translation and subtitle file translation.

How It Works

The UI integrates multiple Whisper implementations, defaulting to SYSTRAN/faster-whisper for optimized VRAM usage and speed. It supports pre-processing audio with Silero VAD, BGM separation with UVR, and post-processing with pyannote for speaker diarization. Translation capabilities are extended via Facebook NLLB models and the DeepL API.

Quick Start & Requirements

  • Install: Clone the repository and run install.bat or install.sh to set up dependencies in a virtual environment. Run start-webui.bat or start-webui.sh to launch.
  • Prerequisites: Git, Python (3.10-3.12), FFmpeg (added to PATH), and CUDA (defaulting to 12.4, requires manual adjustment for other versions/hardware).
  • Docker: Build image (docker compose build) and run (docker compose up).
  • Resources: Image size is ~7GB. Faster-whisper significantly reduces VRAM usage compared to the original OpenAI implementation.
  • Docs: Wiki

Highlighted Details

  • Supports OpenAI/Whisper, SYSTRAN/faster-whisper, and insanely-fast-whisper.
  • Generates SRT, WebVTT, and TXT subtitle formats.
  • Includes speech-to-text translation to English and subtitle file translation.
  • Integrates speaker diarization via pyannote (requires Huggingface token and term acceptance).
  • Offers BGM separation using UVR.

Maintenance & Community

  • Active development with several features marked as completed in the TODO list.
  • Community contributions for translations are welcomed.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • Requires manual configuration of --extra-index-url in requirements.txt for non-Nvidia GPUs or different CUDA versions.
  • Pyannote speaker diarization requires manual Huggingface token setup and term acceptance.
  • Real-time microphone transcription is listed as a future feature.
Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
8
Star History
299 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.