whisper_streaming by ufal

Real-time streaming for long speech-to-text transcription/translation

Created 2 years ago

3,511 stars

Top 13.8% on SourcePulse

2 Experts Love This Project

amin3141

Cofounder of Vectara

jph00

Cofounder of fast.ai

Project Summary

This repository provides a real-time streaming speech-to-text and translation system built upon OpenAI's Whisper model. It addresses the challenge of Whisper's non-streaming nature for long-form audio, enabling applications like live transcription services and multilingual conference support. The target audience includes developers and researchers working with real-time audio processing and speech recognition.

How It Works

Whisper-Streaming employs a "local agreement policy with self-adaptive latency" to achieve real-time performance. It processes audio in chunks, emitting confirmed transcriptions based on agreement across consecutive updates. This approach allows for dynamic latency adjustment, ensuring high quality and responsiveness even with unsegmented long-form speech.

Quick Start & Requirements

Installation: pip install librosa soundfile
Whisper Backend: Requires installation of a backend like faster-whisper (recommended for GPU, requires CUDA >= 11.7), whisper-timestamped, openai-api (no GPU needed, but incurs costs), or mlx-whisper (for Apple Silicon).
Voice Activity Controller (Optional but Recommended): pip install torch torchaudio
Sentence Segmenter (Optional): Required for "sentence" buffer trimming; installation varies by language (e.g., opus-fast-mosestokenizer, tokenize_uk, wtpsplit).
Usage Example: python3 whisper_online.py audio_path --language en
Documentation: Code comments in whisper_online.py serve as full documentation.

Highlighted Details

Achieves 3.3 seconds latency on long-form speech transcription.
Supports transcription and translation tasks.
Integrates Voice Activity Detection (VAD) and a Voice Activity Controller (VAC).
Offers multiple buffer trimming strategies ("segment" and "sentence").

Maintenance & Community

Contributions are welcome.
Credits include Peter Polák for the original idea and the Silero Team for their VAD model.
Contact: Dominik Macháček, machacek@ufal.mff.cuni.cz.

Licensing & Compatibility

The repository itself does not explicitly state a license in the provided README. However, it relies on Whisper, which is typically released under the MIT license. Backend dependencies may have their own licenses.

Limitations & Caveats

The "sentence" buffer trimming option requires installing language-specific sentence segmenters, which can be complex and may not be available for all supported Whisper languages.
Using the OpenAI API backend incurs costs and requires careful monitoring.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

43 stars in the last 30 days

Explore Similar Projects

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier).

hyprwhspr by goodroot

Native speech-to-text for system-wide dictation

Created 4 months ago

Updated 17 hours ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

babelfish.ai by supabase-community

Realtime transcription/translation app using browser-based models

Created 1 year ago

Updated 1 year ago

RuntimeSpeechRecognizer by gtreshchev

Unreal Engine plugin for real-time, offline speech recognition

Created 2 years ago

Updated 10 months ago

stream-translator by fortypercnt

CLI tool for real-time audio transcription/translation from livestreams

Created 3 years ago

Updated 2 years ago

AIVoiceChat by KoljaB

Voice chat for low-latency AI companion interaction

Created 2 years ago

Updated 6 months ago

Speech-Translate by Dadangdut33

Speech-to-text app using Whisper for transcription and translation

Created 3 years ago

Updated 2 years ago

Starred by

Emile Vauge

Emile Vauge(Founder of Traefik).

Scriberr by rishikanthc

Self-hosted app for local AI audio transcription

Created 1 year ago

Updated 3 days ago

use-whisper by chengsokdara

React hook for OpenAI Whisper API with speech recorder

Created 2 years ago

Updated 1 year ago

Starred by

Amin Ahmad

Amin Ahmad(Cofounder of Vectara).

speaches by speaches-ai

OpenAI API-compatible server for transcription, translation, and speech generation

Created 1 year ago

Updated 1 day ago

Starred by

Pietro Schirano

Pietro Schirano(Founder of MagicPath),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

1 more.

whisper_real_time by davabase

Demo for real-time speech-to-text using OpenAI's Whisper

Created 3 years ago

Updated 9 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Matt Schrage

Matt Schrage(Cofounder of Fig).

WhisperLive by collabora

Real-time transcription app using OpenAI's Whisper

Created 2 years ago

Updated 3 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Travis Fischer

Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

Speech-to-text library for realtime applications

Created 2 years ago

Updated 6 months ago

Feedback? Help us improve.