whisper.el by natrys

Emacs Speech-to-Text integration

Created 3 years ago

264 stars

Top 96.5% on SourcePulse

Project Summary

Speech-to-Text interface for Emacs using OpenAI's Whisper model and whisper.cpp as the inference engine. It enables offline transcription of audio from microphones or files directly within Emacs, catering to users who prefer local processing and wish to avoid cloud services. The primary benefit is enabling powerful STT capabilities without external dependencies or recurring costs.

How It Works

The project integrates whisper.cpp, a C++ port of OpenAI's Whisper model, into Emacs Lisp. This allows for local inference on consumer-grade CPUs, eliminating the need for high-end GPUs. Users can capture audio via their input device or select a media file, with transcriptions automatically inserted into Emacs buffers. An option to translate spoken language to English is also provided. The system automatically compiles whisper.cpp and downloads necessary models on first use.

Quick Start & Requirements

Installation: Clone the repository and configure via use-package or install directly from VC source (Emacs 29.1+).
Prerequisites: A C++ compiler and CMake are required to build whisper.cpp. FFmpeg is necessary for audio recording.
Setup: The first invocation automatically compiles whisper.cpp and downloads the selected language model.
MacOS Specific: Users may need to grant Emacs explicit microphone permissions. See MacOS Configuration.

Highlighted Details

Supports multiple Whisper model sizes (tiny to large-v3-turbo) and English-only variants.
Offers various inference modes: direct whisper.cpp execution, local HTTP server, remote server, and OpenAI API compatibility.
whisper.cpp can be re-compiled with hardware acceleration (e.g., cuBLAS, Core ML, OpenVINO) for performance gains. See whisper.cpp build options.
Model quantization options (e.g., q4_0, q5_1) are available to reduce resource usage.
Extensible via hooks (whisper-before-transcription-hook, whisper-after-transcription-hook, whisper-after-insert-hook).
Alternative inference engines like whisper-ctranslate2 can be integrated. See ctranslate2 integration.

Maintenance & Community

The project relies on the active development of the upstream whisper.cpp repository. No specific community channels (e.g., Discord, Slack) or a public roadmap are detailed in the README.

Licensing & Compatibility

The README does not explicitly state the software license for whisper.el. Compatibility for commercial use or linking with closed-source applications is therefore undetermined without a specified license.

Limitations & Caveats

Real-time transcription is noted as likely infeasible with current capabilities. Accuracy is dependent on model size, language, and audio quality. While whisper.cpp is optimized for CPU, demanding use cases might still require significant hardware resources. The training data or methodology for the Whisper models themselves is not publicly available.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days