transcribe-anything by zackees

CLI tool for Whisper AI transcription and translation

Created 4 years ago

1,182 stars

Top 32.8% on SourcePulse

Project Summary

This project provides a user-friendly, multi-backend interface for Whisper AI transcription, designed for ease of use and speed. It targets users needing to transcribe audio or video files, including those from URLs, with features like speaker diarization and GPU acceleration. The primary benefit is a simplified, private transcription workflow with optimized performance.

How It Works

The application leverages various Whisper backends, including OpenAI's original model (cuda), the highly optimized insanely-fast-whisper (insane), and Apple's whisper-mps (mps) for Mac ARM acceleration. It uses yt-dlp for URL handling and static-ffmpeg for media processing. A key differentiator is its ability to generate a speaker.json file, which segments conversations by speaker, achieved through Hugging Face integration and pyannote.audio. Environment isolation via uv ensures dependency management and faster installs.

Quick Start & Requirements

Install via pip: pip install transcribe-anything
Usage: transcribe-anything <URL_or_FILE> [--device <cuda|insane|mps|cpu>]
GPU acceleration (cuda, insane) is automatic on Windows/Linux. Mac users can use --device mps.
For speaker diarization, a Hugging Face token is required, and users must agree to pyannote.audio policies.
Python 3.10+ is recommended.

Highlighted Details

Offers multiple backends for optimized speed (insane, mps).
Unique feature: Generates speaker.json for speaker-attributed transcriptions.
Supports transcription and translation tasks.
Can embed subtitles directly into video files (--embed).

Maintenance & Community

The project is actively maintained by Zackees. Recent updates focus on improving backend compatibility, fixing dependency issues (e.g., with NumPy 2.0), and enhancing features like MPS support and speaker.json generation.

Licensing & Compatibility

The project appears to be MIT licensed, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The insane backend, while fast, can be memory-intensive and may lead to out-of-memory errors on GPUs with less VRAM. The mps backend is English-only and does not support speaker.json. Python 3.12 is not yet fully supported in the backend. Experimental features like insane mode with large-v3 and batching may produce lower-quality transcriptions with timestamp misalignment.

transcribe-anything by zackees

Explore Similar Projects

Stage-Whisper by Stage-Whisper

RuntimeSpeechRecognizer by gtreshchev

awesome-whisper by sindresorhus

whisper-ctranslate2 by Softcatala

whisper.net by sandrohanea

Scriberr by rishikanthc

whisper-standalone-win by Purfview

faster-whisper-GUI by CheshireCC

whisper_real_time by davabase

WhisperLive by collabora

RealtimeSTT by KoljaB

ecoute by SevaSk