transcribe-anything  by zackees

CLI tool for Whisper AI transcription and translation

created 4 years ago
1,034 stars

Top 36.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a user-friendly, multi-backend interface for Whisper AI transcription, designed for ease of use and speed. It targets users needing to transcribe audio or video files, including those from URLs, with features like speaker diarization and GPU acceleration. The primary benefit is a simplified, private transcription workflow with optimized performance.

How It Works

The application leverages various Whisper backends, including OpenAI's original model (cuda), the highly optimized insanely-fast-whisper (insane), and Apple's whisper-mps (mps) for Mac ARM acceleration. It uses yt-dlp for URL handling and static-ffmpeg for media processing. A key differentiator is its ability to generate a speaker.json file, which segments conversations by speaker, achieved through Hugging Face integration and pyannote.audio. Environment isolation via uv ensures dependency management and faster installs.

Quick Start & Requirements

  • Install via pip: pip install transcribe-anything
  • Usage: transcribe-anything <URL_or_FILE> [--device <cuda|insane|mps|cpu>]
  • GPU acceleration (cuda, insane) is automatic on Windows/Linux. Mac users can use --device mps.
  • For speaker diarization, a Hugging Face token is required, and users must agree to pyannote.audio policies.
  • Python 3.10+ is recommended.

Highlighted Details

  • Offers multiple backends for optimized speed (insane, mps).
  • Unique feature: Generates speaker.json for speaker-attributed transcriptions.
  • Supports transcription and translation tasks.
  • Can embed subtitles directly into video files (--embed).

Maintenance & Community

The project is actively maintained by Zackees. Recent updates focus on improving backend compatibility, fixing dependency issues (e.g., with NumPy 2.0), and enhancing features like MPS support and speaker.json generation.

Licensing & Compatibility

The project appears to be MIT licensed, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The insane backend, while fast, can be memory-intensive and may lead to out-of-memory errors on GPUs with less VRAM. The mps backend is English-only and does not support speaker.json. Python 3.12 is not yet fully supported in the backend. Experimental features like insane mode with large-v3 and batching may produce lower-quality transcriptions with timestamp misalignment.

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
2
Star History
338 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.