LiveWhisper  by Nikorasu

Live transcription tool using OpenAI's Whisper

created 2 years ago
353 stars

Top 80.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a Python implementation for near real-time speech-to-text transcription using OpenAI's Whisper model and the sounddevice library. It's designed for users who need continuous audio processing and offers an optional voice assistant component for command-based interactions.

How It Works

The core livewhisper.py script captures microphone audio, buffering segments that exceed a volume and frequency threshold. Upon detecting silence, it saves the buffered audio to a temporary file and submits it to the Whisper model for transcription, outputting results sentence-by-sentence. The assistant.py script builds upon this, adding voice command capabilities for tasks like weather, Wikipedia searches, and media control.

Quick Start & Requirements

  • Install via pip (requires existing Whisper installation).
  • Dependencies: numpy, scipy, sounddevice, requests, pyttsx3, wikipedia, bs4.
  • Voice assistant requires espeak and python3-espeak.
  • Linux users may need to configure PulseAudio for noise/echo cancellation for media controls.
  • Official documentation and demo links are not provided in the README.

Highlighted Details

  • Near real-time, sentence-by-sentence transcription.
  • Voice assistant with customizable wake words ("computer", "hey computer", "okay computer").
  • Supports weather, date/time, jokes, Wikipedia searches, basic math, and media player control.
  • Media control functionality may require audio noise cancellation setup.

Maintenance & Community

  • The project is maintained by Nikorasu.
  • A Ko-fi link is provided for donations.
  • No links to community channels, roadmaps, or other social platforms are present.

Licensing & Compatibility

  • The license is not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as a "nearly-live" implementation, implying potential latency. The voice assistant's ability to handle general requests relies on Google's instant-answer snippets, which may not always be reliable. Media control functionality is noted to require specific audio configuration.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.