Dictation app using OpenAI's Whisper model for real-time transcription
Top 42.2% on sourcepulse
WhisperWriter is a desktop dictation application that transcribes microphone audio to text and automatically inputs it into the active window. It targets users who need hands-free text input, offering local or API-based transcription via OpenAI's Whisper model.
How It Works
The application utilizes the faster-whisper
Python package for local inference or OpenAI's API for transcription. It operates in the background, listening for a configurable keyboard shortcut to begin recording. Multiple recording modes are supported, including continuous recording, voice activity detection, press-to-toggle, and hold-to-record. Transcribed text is then simulated as keyboard input to the currently active application window.
Quick Start & Requirements
pip install -r requirements.txt
. Execute with python run.py
.faster-whisper
, CUDA 12 and cuBLAS/cuDNN 8 are required. Installation of NVIDIA libraries can be done via Docker, pip
on Linux, or downloading from Purfview's repository.Highlighted Details
faster-whisper
inference (GPU/CPU) and OpenAI API.Maintenance & Community
The project is actively maintained, with a recent major rewrite merged. Contributions are welcome via pull requests or issues.
Licensing & Compatibility
Licensed under the GNU General Public License (GPL). This may impose copyleft restrictions on derivative works, potentially impacting commercial or closed-source integration.
Limitations & Caveats
The project is described as having had minimal effort put into testing or contribution ease. The recent major rewrite may introduce new bugs, and users are asked to be patient. Compatibility with CUDA 11 requires downgrading ctranslate2
to version 3.24.0.
11 months ago
1 week