Live transcription tool using OpenAI's Whisper
Top 80.0% on sourcepulse
This project provides a Python implementation for near real-time speech-to-text transcription using OpenAI's Whisper model and the sounddevice
library. It's designed for users who need continuous audio processing and offers an optional voice assistant component for command-based interactions.
How It Works
The core livewhisper.py
script captures microphone audio, buffering segments that exceed a volume and frequency threshold. Upon detecting silence, it saves the buffered audio to a temporary file and submits it to the Whisper model for transcription, outputting results sentence-by-sentence. The assistant.py
script builds upon this, adding voice command capabilities for tasks like weather, Wikipedia searches, and media control.
Quick Start & Requirements
pip
(requires existing Whisper installation).numpy
, scipy
, sounddevice
, requests
, pyttsx3
, wikipedia
, bs4
.espeak
and python3-espeak
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is described as a "nearly-live" implementation, implying potential latency. The voice assistant's ability to handle general requests relies on Google's instant-answer snippets, which may not always be reliable. Media control functionality is noted to require specific audio configuration.
1 week ago
Inactive