Discover and explore top open-source AI tools and projects—updated daily.
natrysEmacs Speech-to-Text integration
Top 100.0% on SourcePulse
Speech-to-Text interface for Emacs using OpenAI's Whisper model and whisper.cpp as the inference engine. It enables offline transcription of audio from microphones or files directly within Emacs, catering to users who prefer local processing and wish to avoid cloud services. The primary benefit is enabling powerful STT capabilities without external dependencies or recurring costs.
How It Works
The project integrates whisper.cpp, a C++ port of OpenAI's Whisper model, into Emacs Lisp. This allows for local inference on consumer-grade CPUs, eliminating the need for high-end GPUs. Users can capture audio via their input device or select a media file, with transcriptions automatically inserted into Emacs buffers. An option to translate spoken language to English is also provided. The system automatically compiles whisper.cpp and downloads necessary models on first use.
Quick Start & Requirements
use-package or install directly from VC source (Emacs 29.1+).whisper.cpp. FFmpeg is necessary for audio recording.whisper.cpp and downloads the selected language model.Highlighted Details
whisper.cpp execution, local HTTP server, remote server, and OpenAI API compatibility.whisper.cpp can be re-compiled with hardware acceleration (e.g., cuBLAS, Core ML, OpenVINO) for performance gains. See whisper.cpp build options.whisper-before-transcription-hook, whisper-after-transcription-hook, whisper-after-insert-hook).whisper-ctranslate2 can be integrated. See ctranslate2 integration.Maintenance & Community
The project relies on the active development of the upstream whisper.cpp repository. No specific community channels (e.g., Discord, Slack) or a public roadmap are detailed in the README.
Licensing & Compatibility
The README does not explicitly state the software license for whisper.el. Compatibility for commercial use or linking with closed-source applications is therefore undetermined without a specified license.
Limitations & Caveats
Real-time transcription is noted as likely infeasible with current capabilities. Accuracy is dependent on model size, language, and audio quality. While whisper.cpp is optimized for CPU, demanding use cases might still require significant hardware resources. The training data or methodology for the Whisper models themselves is not publicly available.
2 weeks ago
1 day