whisper-flow  by dimastatz

Real-time audio transcription using OpenAI Whisper

Created 1 year ago
297 stars

Top 89.4% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a real-time transcription service using OpenAI's Whisper model, enabling applications to process streaming audio and receive immediate text output. It's designed for developers and researchers needing to integrate live speech-to-text capabilities into their systems, offering faster transcription than batch processing.

How It Works

Whisper Flow processes audio streams by splitting them into segments based on natural speech patterns like pauses or speaker changes. It utilizes a "tumbling window" approach to manage these segments. Transcriptions are returned incrementally as partial results, with final results marked, allowing for dynamic updates within applications. This method prioritizes low latency for real-time interaction.

Quick Start & Requirements

To run as a web server:

git clone https://github.com/dimastatz/whisper-flow.git
cd whisper-flow
./run.sh -local
source .venv/bin/activate
./run.sh -benchmark

As a Python package:

pip install whisperflow

Requires Python and potentially specific audio libraries. Benchmarking was performed on a MacBook Air M1 with 16GB RAM.

Highlighted Details

  • Achieves sub-500ms latency and approximately 7% Word Error Rate (WER) on a MacBook Air M1.
  • Supports real-time transcription of streaming audio via WebSocket endpoints.
  • Outputs incremental transcriptions with IsPartial flags for dynamic updates.
  • Benchmarking relies on the LibriSpeech dataset and metrics like WER and latency.

Maintenance & Community

The project has released v1.0-RC and v1.1, with plans for v1.2 to integrate with py-speech. No specific community channels or contributor details are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

The README notes that real-time streaming may come at the expense of accuracy compared to batch processing. The project appears to be in active development, with future integration plans.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.5%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.