Whisper-WebUI by jhj0517

Web UI for Whisper-based subtitle generation

Created 2 years ago

2,613 stars

Top 17.8% on SourcePulse

Project Summary

This project provides a Gradio-based web UI for the Whisper speech-to-text model, enabling easy subtitle generation from various sources like files, YouTube, and microphones. It targets users needing efficient and versatile subtitle creation, offering features like speech-to-text translation and subtitle file translation.

How It Works

The UI integrates multiple Whisper implementations, defaulting to SYSTRAN/faster-whisper for optimized VRAM usage and speed. It supports pre-processing audio with Silero VAD, BGM separation with UVR, and post-processing with pyannote for speaker diarization. Translation capabilities are extended via Facebook NLLB models and the DeepL API.

Quick Start & Requirements

Install: Clone the repository and run install.bat or install.sh to set up dependencies in a virtual environment. Run start-webui.bat or start-webui.sh to launch.
Prerequisites: Git, Python (3.10-3.12), FFmpeg (added to PATH), and CUDA (defaulting to 12.4, requires manual adjustment for other versions/hardware).
Docker: Build image (docker compose build) and run (docker compose up).
Resources: Image size is ~7GB. Faster-whisper significantly reduces VRAM usage compared to the original OpenAI implementation.
Docs: Wiki

Highlighted Details

Supports OpenAI/Whisper, SYSTRAN/faster-whisper, and insanely-fast-whisper.
Generates SRT, WebVTT, and TXT subtitle formats.
Includes speech-to-text translation to English and subtitle file translation.
Integrates speaker diarization via pyannote (requires Huggingface token and term acceptance).
Offers BGM separation using UVR.

Maintenance & Community

Active development with several features marked as completed in the TODO list.
Community contributions for translations are welcomed.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

Requires manual configuration of --extra-index-url in requirements.txt for non-Nvidia GPUs or different CUDA versions.
Pyannote speaker diarization requires manual Huggingface token setup and term acceptance.
Real-time microphone transcription is listed as a future feature.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

3

Issues (30d)

6

Star History

63 stars in the last 30 days

Explore Similar Projects

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai).

insanely-fast-whisper-cli by ochen1

CLI tool for optimized, fast Whisper-based ASR

Created 2 years ago

Updated 1 year ago

whisper-website by Kabanosk

Web app for local speech-to-text using Whisper

Created 3 years ago

Updated 4 months ago

Speech-Translate by Dadangdut33

Speech-to-text app using Whisper for transcription and translation

Created 3 years ago

Updated 2 years ago

subgen by McCloudS

Subtitle generator for media servers

Created 2 years ago

Updated 1 day ago

subsai by absadiki

Subtitle generation tool (Web-UI + CLI + Python package) using Whisper

Created 2 years ago

Updated 4 months ago

auto-subs by tmoroney

DaVinci Resolve script for subtitle generation and speaker diarization

Created 2 years ago

Updated 1 week ago

N46Whisper by Ayanaminn

Colab notebook for streamlined video subtitle generation

Created 3 years ago

Updated 10 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

whisper-plus by kadirnar

Speech-to-text toolkit for enhanced audio processing

Created 2 years ago

Updated 1 month ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic) and

Didier Lopes

Didier Lopes(Founder of OpenBB).

yt-whisper by m1guelpf

CLI tool for generating YouTube subtitles

Created 3 years ago

Updated 2 years ago

LanguageLeapAI by SociallyIneptWeeb

Real-time AI translator for cross-lingual online communication

Created 2 years ago

Updated 2 years ago

auto-subtitle by m1guelpf

CLI tool for automatic video subtitling

Created 3 years ago

Updated 1 year ago

Starred by

Abubakar Abid

Abubakar Abid(Cofounder of Gradio).

voice-pro by abus-aikorea

WebUI for speech recognition, translation, and dubbing

Created 1 year ago

Updated 1 month ago

Feedback? Help us improve.