whisper-writer by savbell

Dictation app using OpenAI's Whisper model for real-time transcription

Created 2 years ago

998 stars

Top 37.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeremy Howard

Cofounder of fast.ai

Project Summary

WhisperWriter is a desktop dictation application that transcribes microphone audio to text and automatically inputs it into the active window. It targets users who need hands-free text input, offering local or API-based transcription via OpenAI's Whisper model.

How It Works

The application utilizes the faster-whisper Python package for local inference or OpenAI's API for transcription. It operates in the background, listening for a configurable keyboard shortcut to begin recording. Multiple recording modes are supported, including continuous recording, voice activity detection, press-to-toggle, and hold-to-record. Transcribed text is then simulated as keyboard input to the currently active application window.

Quick Start & Requirements

Install: Clone the repository, create a virtual environment, and run pip install -r requirements.txt. Execute with python run.py.
Prerequisites: Python 3.11, Git. For GPU acceleration with faster-whisper, CUDA 12 and cuBLAS/cuDNN 8 are required. Installation of NVIDIA libraries can be done via Docker, pip on Linux, or downloading from Purfview's repository.
Setup: Refer to the official documentation for NVIDIA library installation.

Highlighted Details

Supports both local faster-whisper inference (GPU/CPU) and OpenAI API.
Offers four distinct recording modes: continuous, voice activity detection, press-to-toggle, and hold-to-record.
UI recently migrated from Tkinter to PyQt5, including a new settings window and continuous recording mode.
Configurable activation key, transcription language, model, and various input/output parameters.

Maintenance & Community

The project is actively maintained, with a recent major rewrite merged. Contributions are welcome via pull requests or issues.

Licensing & Compatibility

Licensed under the GNU General Public License (GPL). This may impose copyleft restrictions on derivative works, potentially impacting commercial or closed-source integration.

Limitations & Caveats

The project is described as having had minimal effort put into testing or contribution ease. The recent major rewrite may introduce new bugs, and users are asked to be patient. Compatibility with CUDA 11 requires downgrading ctranslate2 to version 3.24.0.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days