WhisperLive  by collabora

Real-time transcription app using OpenAI's Whisper

created 2 years ago
3,174 stars

Top 15.5% on sourcepulse

GitHubView on GitHub
Project Summary

WhisperLive provides a near real-time speech-to-text transcription service, leveraging OpenAI's Whisper model. It's designed for developers and users needing to transcribe live audio streams or pre-recorded files efficiently, offering multiple backend options for performance tuning.

How It Works

WhisperLive operates with a client-server architecture. The server supports three backends: faster_whisper for general use, NVIDIA's TensorRT-LLM for optimized GPU inference, and OpenVINO for Intel hardware acceleration. The server can dynamically load different Whisper model sizes per client or use a pre-loaded custom model for all clients. The client connects to the server to send audio data and receive transcribed text.

Quick Start & Requirements

  • Installation: pip install whisper-live or use Docker images.
  • Backends: Requires specific setup for TensorRT (see TensorRT_whisper.md) and OpenVINO.
  • Dependencies: PyAudio, faster-whisper, TensorRT-LLM, or OpenVINO runtime. GPU with CUDA 11+ is recommended for TensorRT.
  • Running Server (Faster Whisper): python3 run_server.py --port 9090 --backend faster_whisper
  • Running Client: from whisper_live.client import TranscriptionClient; client = TranscriptionClient("localhost", 9090); client("audio.wav") or client() for microphone.
  • Docker: docker run -it --gpus all -p 9090:9090 ghcr.io/collabora/whisperlive-gpu:latest

Highlighted Details

  • Supports faster_whisper, TensorRT, and OpenVINO backends for flexible performance.
  • Can transcribe live microphone input, audio files, RTSP, and HLS streams.
  • Offers optional Voice Activity Detection (VAD) and audio recording.
  • Browser extensions are available for direct browser-based transcription.

Maintenance & Community

Developed by Collabora. Contact available via their website or provided email addresses.

Licensing & Compatibility

The project appears to be Apache 2.0 licensed, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

The TensorRT backend setup is complex and requires building custom engines, with Docker recommended for easier deployment. Native OpenVINO setup requires manual installation of Intel drivers and runtime. The default server behavior of loading a new model per client can increase VRAM usage and initial connection latency.

Health Check
Last commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)
7
Issues (30d)
3
Star History
419 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.9%
8k
Speech-to-text library for realtime applications
created 1 year ago
updated 3 weeks ago
Feedback? Help us improve.