Fast Whisper transcription CLI
Top 6.1% on sourcepulse
This project provides an opinionated Command Line Interface (CLI) for highly accelerated on-device audio transcription using OpenAI's Whisper models. It targets users needing to process large audio files quickly, offering transcription speeds up to 150 minutes in under 2 minutes on high-end GPUs.
How It Works
The CLI leverages Hugging Face Transformers, Optimum, and Flash Attention 2 for significant performance gains. It enables FP16 precision, batching, and optimized attention mechanisms to drastically reduce transcription time compared to standard implementations. The project also supports speaker diarization through integration with pyannote.audio
.
Quick Start & Requirements
pipx install insanely-fast-whisper
.mps
).pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation
.Highlighted Details
distil-whisper/large-v2
and Flash Attention 2.cuda
or mps
), task (transcribe/translate), language detection, and timestamp granularity.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive