CLI tool for Whisper AI transcription and translation
Top 36.9% on sourcepulse
This project provides a user-friendly, multi-backend interface for Whisper AI transcription, designed for ease of use and speed. It targets users needing to transcribe audio or video files, including those from URLs, with features like speaker diarization and GPU acceleration. The primary benefit is a simplified, private transcription workflow with optimized performance.
How It Works
The application leverages various Whisper backends, including OpenAI's original model (cuda
), the highly optimized insanely-fast-whisper
(insane
), and Apple's whisper-mps
(mps
) for Mac ARM acceleration. It uses yt-dlp
for URL handling and static-ffmpeg
for media processing. A key differentiator is its ability to generate a speaker.json
file, which segments conversations by speaker, achieved through Hugging Face integration and pyannote.audio
. Environment isolation via uv
ensures dependency management and faster installs.
Quick Start & Requirements
pip install transcribe-anything
transcribe-anything <URL_or_FILE> [--device <cuda|insane|mps|cpu>]
cuda
, insane
) is automatic on Windows/Linux. Mac users can use --device mps
.pyannote.audio
policies.Highlighted Details
insane
, mps
).speaker.json
for speaker-attributed transcriptions.--embed
).Maintenance & Community
The project is actively maintained by Zackees. Recent updates focus on improving backend compatibility, fixing dependency issues (e.g., with NumPy 2.0), and enhancing features like MPS support and speaker.json
generation.
Licensing & Compatibility
The project appears to be MIT licensed, allowing for commercial use and integration with closed-source projects.
Limitations & Caveats
The insane
backend, while fast, can be memory-intensive and may lead to out-of-memory errors on GPUs with less VRAM. The mps
backend is English-only and does not support speaker.json
. Python 3.12 is not yet fully supported in the backend. Experimental features like insane
mode with large-v3
and batching may produce lower-quality transcriptions with timestamp misalignment.
1 week ago
1 week