whisper-ctranslate2  by Softcatala

CLI tool for faster Whisper transcription/translation

created 2 years ago
1,076 stars

Top 35.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a command-line interface for the Whisper speech-to-text model, optimized for performance using CTranslate2. It targets users who need faster and more memory-efficient transcription and translation than the original OpenAI Whisper implementation, offering a seamless migration path.

How It Works

The client leverages the CTranslate2 library, a fast inference engine for Transformer models, to run Whisper. This approach enables significant speedups (up to 4x) and reduced memory usage by employing optimized kernels and quantization techniques (INT8, FP16). It supports batched inference for further performance gains and integrates a Voice Activity Detection (VAD) filter for improved processing of speech segments.

Quick Start & Requirements

  • Install: pip install -U whisper-ctranslate2
  • Docker: docker pull ghcr.io/softcatala/whisper-ctranslate2:latest
  • GPU support requires NVIDIA cuBLAS 11.x and cuDNN 8.x.
  • CPU support includes x86-64 and ARM64 with various backends.
  • Documentation: https://github.com/Softcatala/whisper-ctranslate2

Highlighted Details

  • Up to 4x faster and uses less memory than OpenAI Whisper.
  • Supports transcription and translation (to English).
  • Options for batched inference, quantization (--compute_type), VAD filtering, and live microphone transcription.
  • Experimental diarization support via pyannote.audio requires Hugging Face token and specific model acceptances.

Maintenance & Community

Licensing & Compatibility

  • License: MIT.
  • Compatible with commercial use.

Limitations & Caveats

Translation is currently limited to English as the target language. Experimental diarization requires manual setup and acceptance of third-party model terms.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
73 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

GPT-SoVITS by RVC-Boss

0.6%
49k
Few-shot voice cloning and TTS web UI
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.