insanely-fast-whisper by Vaibhavs10

Fast Whisper transcription CLI

Created 2 years ago

8,815 stars

Top 5.8% on SourcePulse

View on GitHub

12 Experts Love This Project

Pietro Schirano

Founder of MagicPath

Jonathan Ragan-Kelley

Professor at MIT

Tim J. Baek

Founder of Open WebUI

Luis Capelo

Cofounder of Lightning AI

and 8 more!

Project Summary

This project provides an opinionated Command Line Interface (CLI) for highly accelerated on-device audio transcription using OpenAI's Whisper models. It targets users needing to process large audio files quickly, offering transcription speeds up to 150 minutes in under 2 minutes on high-end GPUs.

How It Works

The CLI leverages Hugging Face Transformers, Optimum, and Flash Attention 2 for significant performance gains. It enables FP16 precision, batching, and optimized attention mechanisms to drastically reduce transcription time compared to standard implementations. The project also supports speaker diarization through integration with pyannote.audio.

Quick Start & Requirements

Install via pipx install insanely-fast-whisper.
Requires NVIDIA GPU with CUDA or macOS with Apple Silicon (mps).
For Flash Attention 2, manual installation might be needed: pipx runpip insanely-fast-whisper install flash-attn --no-build-isolation.
Official docs: https://github.com/Vaibhavs10/insanely-fast-whisper

Highlighted Details

Achieves ~1 minute 18 seconds for 150 minutes of audio with distil-whisper/large-v2 and Flash Attention 2.
Supports Whisper Large v3 with Flash Attention 2 for ~1 minute 38 seconds transcription time.
Includes options for batch size, device selection (cuda or mps), task (transcribe/translate), language detection, and timestamp granularity.
Offers speaker diarization with configurable speaker counts.

Maintenance & Community

Community-driven development with active community showcases and contributions.
Links to community projects: https://github.com/ochen1/insanely-fast-whisper-cli, https://github.com/arihanv/Shush, https://github.com/kadirnar/whisper-plus.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The CLI is opinionated and primarily targets NVIDIA GPUs and macOS. Windows support for CUDA may require specific PyTorch installations to resolve CUDA enablement issues. MPS backend on Mac is more memory-intensive.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

36 stars in the last 30 days