insanely-fast-whisper-cli by ochen1

CLI tool for optimized, fast Whisper-based ASR

Created 2 years ago

384 stars

Top 74.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeremy Howard

Cofounder of fast.ai

Project Summary

This project provides a command-line interface for highly optimized automatic speech recognition (ASR) using OpenAI's Whisper models. It targets users needing fast and efficient audio transcription, offering significant speed improvements for processing large audio files with features like timestamp generation for subtitles.

How It Works

The CLI leverages Hugging Face's Transformers and Optimum libraries to implement various performance optimizations. Users can select different Whisper model sizes, including English-only variants, and fine-tune processing through parameters like batch size, data type (float16/float32), and enabling BetterTransformer. These optimizations aim to drastically reduce transcription time, as demonstrated by the claim of transcribing 300 minutes of audio in under 10 minutes using Whisper Large v2.

Quick Start & Requirements

Install via pip install -r requirements.txt after cloning the repository.
Requires Python and a CUDA-enabled GPU for optimal performance.
Usage example: insanely-fast-whisper --model openai/whisper-base --device cuda:0 --dtype float32 --batch-size 8 --better-transformer --chunk-length 30 your_audio_file.wav
Official documentation is available via the GitHub repository.

Highlighted Details

Claims 300 minutes of audio transcribed in <10 minutes with Whisper Large v2.
Supports customizable batch size, data type, and BetterTransformer.
Generates SRT output with accurate timestamps.
Offers choice of Hugging Face ASR models, including English-only variants.

Maintenance & Community

The project is developed by @ochen1 and acknowledges contributions from Vaibhavs10/insanely-fast-whisper. Community interaction and feedback are encouraged through GitHub issues.

Licensing & Compatibility

Licensed under the MIT License. This permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Performance claims are specific to Whisper Large v2 and may vary with other models or hardware. The primary optimizations are geared towards GPU acceleration, with potential performance degradation on CPU-only systems.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days