whisper-flow by dimastatz

Real-time audio transcription using OpenAI Whisper

Created 1 year ago

487 stars

Top 63.3% on SourcePulse

Project Summary

This project provides a real-time transcription service using OpenAI's Whisper model, enabling applications to process streaming audio and receive immediate text output. It's designed for developers and researchers needing to integrate live speech-to-text capabilities into their systems, offering faster transcription than batch processing.

How It Works

Whisper Flow processes audio streams by splitting them into segments based on natural speech patterns like pauses or speaker changes. It utilizes a "tumbling window" approach to manage these segments. Transcriptions are returned incrementally as partial results, with final results marked, allowing for dynamic updates within applications. This method prioritizes low latency for real-time interaction.

Quick Start & Requirements

To run as a web server:

git clone https://github.com/dimastatz/whisper-flow.git
cd whisper-flow
./run.sh -local
source .venv/bin/activate
./run.sh -benchmark

As a Python package:

pip install whisperflow

Requires Python and potentially specific audio libraries. Benchmarking was performed on a MacBook Air M1 with 16GB RAM.

Highlighted Details

Achieves sub-500ms latency and approximately 7% Word Error Rate (WER) on a MacBook Air M1.
Supports real-time transcription of streaming audio via WebSocket endpoints.
Outputs incremental transcriptions with IsPartial flags for dynamic updates.
Benchmarking relies on the LibriSpeech dataset and metrics like WER and latency.

Maintenance & Community

The project has released v1.0-RC and v1.1, with plans for v1.2 to integrate with py-speech. No specific community channels or contributor details are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

The README notes that real-time streaming may come at the expense of accuracy compared to batch processing. The project appears to be in active development, with future integration plans.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

133 stars in the last 30 days