CLI tool for optimized, fast Whisper-based ASR
Top 78.4% on sourcepulse
This project provides a command-line interface for highly optimized automatic speech recognition (ASR) using OpenAI's Whisper models. It targets users needing fast and efficient audio transcription, offering significant speed improvements for processing large audio files with features like timestamp generation for subtitles.
How It Works
The CLI leverages Hugging Face's Transformers and Optimum libraries to implement various performance optimizations. Users can select different Whisper model sizes, including English-only variants, and fine-tune processing through parameters like batch size, data type (float16/float32), and enabling BetterTransformer. These optimizations aim to drastically reduce transcription time, as demonstrated by the claim of transcribing 300 minutes of audio in under 10 minutes using Whisper Large v2.
Quick Start & Requirements
pip install -r requirements.txt
after cloning the repository.insanely-fast-whisper --model openai/whisper-base --device cuda:0 --dtype float32 --batch-size 8 --better-transformer --chunk-length 30 your_audio_file.wav
Highlighted Details
Maintenance & Community
The project is developed by @ochen1 and acknowledges contributions from Vaibhavs10/insanely-fast-whisper. Community interaction and feedback are encouraged through GitHub issues.
Licensing & Compatibility
Licensed under the MIT License. This permissive license allows for commercial use and integration into closed-source projects.
Limitations & Caveats
Performance claims are specific to Whisper Large v2 and may vary with other models or hardware. The primary optimizations are geared towards GPU acceleration, with potential performance degradation on CPU-only systems.
1 year ago
1 week