Whisperer  by tigros

CLI tool for batch speech-to-text using Whisper

created 2 years ago
299 stars

Top 90.0% on sourcepulse

GitHubView on GitHub
Project Summary

Whisperer is a batch speech-to-text tool designed for generating subtitles from video and audio files. It leverages OpenAI's Whisper model, specifically the GPU-accelerated whisper.cpp implementation, to process multiple files concurrently, scaling with available GPU memory.

How It Works

Whisperer utilizes whisper.cpp for efficient, GPU-accelerated inference of the Whisper model. It supports batch processing, launching multiple instances of the model to maximize throughput based on the user's GPU memory capacity. This approach offers significant speed improvements over CPU-based processing.

Quick Start & Requirements

  • Install via pip install whisperer.
  • Requires ffmpeg to be in the system's PATH.
  • Download Whisper models from Hugging Face (e.g., ggerganov/whisper.cpp/tree/main), ensuring not to use v3 models as they are not yet supported.
  • GPU with sufficient memory is required for optimal performance.

Highlighted Details

  • Utilizes whisper.cpp for significant GPU-accelerated speed improvements.
  • Supports batch processing, scaling with GPU memory.
  • Generates subtitles for video and audio files.

Maintenance & Community

No specific community channels or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The project explicitly states that v3 Whisper models are not yet supported. ffmpeg is a required external dependency not included with the package.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.