LiveWhisper  by Nikorasu

Live transcription tool using OpenAI's Whisper

Created 2 years ago
357 stars

Top 78.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a Python implementation for near real-time speech-to-text transcription using OpenAI's Whisper model and the sounddevice library. It's designed for users who need continuous audio processing and offers an optional voice assistant component for command-based interactions.

How It Works

The core livewhisper.py script captures microphone audio, buffering segments that exceed a volume and frequency threshold. Upon detecting silence, it saves the buffered audio to a temporary file and submits it to the Whisper model for transcription, outputting results sentence-by-sentence. The assistant.py script builds upon this, adding voice command capabilities for tasks like weather, Wikipedia searches, and media control.

Quick Start & Requirements

  • Install via pip (requires existing Whisper installation).
  • Dependencies: numpy, scipy, sounddevice, requests, pyttsx3, wikipedia, bs4.
  • Voice assistant requires espeak and python3-espeak.
  • Linux users may need to configure PulseAudio for noise/echo cancellation for media controls.
  • Official documentation and demo links are not provided in the README.

Highlighted Details

  • Near real-time, sentence-by-sentence transcription.
  • Voice assistant with customizable wake words ("computer", "hey computer", "okay computer").
  • Supports weather, date/time, jokes, Wikipedia searches, basic math, and media player control.
  • Media control functionality may require audio noise cancellation setup.

Maintenance & Community

  • The project is maintained by Nikorasu.
  • A Ko-fi link is provided for donations.
  • No links to community channels, roadmaps, or other social platforms are present.

Licensing & Compatibility

  • The license is not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as a "nearly-live" implementation, implying potential latency. The voice assistant's ability to handle general requests relies on Google's instant-answer snippets, which may not always be reliable. Media control functionality is noted to require specific audio configuration.

Health Check
Last Commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.5%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.