CLI tool for faster Whisper transcription/translation
Top 35.8% on sourcepulse
This project provides a command-line interface for the Whisper speech-to-text model, optimized for performance using CTranslate2. It targets users who need faster and more memory-efficient transcription and translation than the original OpenAI Whisper implementation, offering a seamless migration path.
How It Works
The client leverages the CTranslate2 library, a fast inference engine for Transformer models, to run Whisper. This approach enables significant speedups (up to 4x) and reduced memory usage by employing optimized kernels and quantization techniques (INT8, FP16). It supports batched inference for further performance gains and integrates a Voice Activity Detection (VAD) filter for improved processing of speech segments.
Quick Start & Requirements
pip install -U whisper-ctranslate2
docker pull ghcr.io/softcatala/whisper-ctranslate2:latest
Highlighted Details
--compute_type
), VAD filtering, and live microphone transcription.pyannote.audio
requires Hugging Face token and specific model acceptances.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Translation is currently limited to English as the target language. Experimental diarization requires manual setup and acceptance of third-party model terms.
2 months ago
1 day