Discover and explore top open-source AI tools and projects—updated daily.
zackeesCLI tool for Whisper AI transcription and translation
Top 33.8% on SourcePulse
This project provides a user-friendly, multi-backend interface for Whisper AI transcription, designed for ease of use and speed. It targets users needing to transcribe audio or video files, including those from URLs, with features like speaker diarization and GPU acceleration. The primary benefit is a simplified, private transcription workflow with optimized performance.
How It Works
The application leverages various Whisper backends, including OpenAI's original model (cuda), the highly optimized insanely-fast-whisper (insane), and Apple's whisper-mps (mps) for Mac ARM acceleration. It uses yt-dlp for URL handling and static-ffmpeg for media processing. A key differentiator is its ability to generate a speaker.json file, which segments conversations by speaker, achieved through Hugging Face integration and pyannote.audio. Environment isolation via uv ensures dependency management and faster installs.
Quick Start & Requirements
pip install transcribe-anythingtranscribe-anything <URL_or_FILE> [--device <cuda|insane|mps|cpu>]cuda, insane) is automatic on Windows/Linux. Mac users can use --device mps.pyannote.audio policies.Highlighted Details
insane, mps).speaker.json for speaker-attributed transcriptions.--embed).Maintenance & Community
The project is actively maintained by Zackees. Recent updates focus on improving backend compatibility, fixing dependency issues (e.g., with NumPy 2.0), and enhancing features like MPS support and speaker.json generation.
Licensing & Compatibility
The project appears to be MIT licensed, allowing for commercial use and integration with closed-source projects.
Limitations & Caveats
The insane backend, while fast, can be memory-intensive and may lead to out-of-memory errors on GPUs with less VRAM. The mps backend is English-only and does not support speaker.json. Python 3.12 is not yet fully supported in the backend. Experimental features like insane mode with large-v3 and batching may produce lower-quality transcriptions with timestamp misalignment.
4 weeks ago
1 week
davabase
KoljaB