insanely-fast-whisper-cli  by ochen1

CLI tool for optimized, fast Whisper-based ASR

created 1 year ago
364 stars

Top 78.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a command-line interface for highly optimized automatic speech recognition (ASR) using OpenAI's Whisper models. It targets users needing fast and efficient audio transcription, offering significant speed improvements for processing large audio files with features like timestamp generation for subtitles.

How It Works

The CLI leverages Hugging Face's Transformers and Optimum libraries to implement various performance optimizations. Users can select different Whisper model sizes, including English-only variants, and fine-tune processing through parameters like batch size, data type (float16/float32), and enabling BetterTransformer. These optimizations aim to drastically reduce transcription time, as demonstrated by the claim of transcribing 300 minutes of audio in under 10 minutes using Whisper Large v2.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Requires Python and a CUDA-enabled GPU for optimal performance.
  • Usage example: insanely-fast-whisper --model openai/whisper-base --device cuda:0 --dtype float32 --batch-size 8 --better-transformer --chunk-length 30 your_audio_file.wav
  • Official documentation is available via the GitHub repository.

Highlighted Details

  • Claims 300 minutes of audio transcribed in <10 minutes with Whisper Large v2.
  • Supports customizable batch size, data type, and BetterTransformer.
  • Generates SRT output with accurate timestamps.
  • Offers choice of Hugging Face ASR models, including English-only variants.

Maintenance & Community

The project is developed by @ochen1 and acknowledges contributions from Vaibhavs10/insanely-fast-whisper. Community interaction and feedback are encouraged through GitHub issues.

Licensing & Compatibility

Licensed under the MIT License. This permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Performance claims are specific to Whisper Large v2 and may vary with other models or hardware. The primary optimizations are geared towards GPU acceleration, with potential performance degradation on CPU-only systems.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.