WhisperLive by collabora

Real-time transcription app using OpenAI's Whisper

Created 2 years ago

3,723 stars

Top 12.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Matt Schrage

Cofounder of Fig

Project Summary

WhisperLive provides a near real-time speech-to-text transcription service, leveraging OpenAI's Whisper model. It's designed for developers and users needing to transcribe live audio streams or pre-recorded files efficiently, offering multiple backend options for performance tuning.

How It Works

WhisperLive operates with a client-server architecture. The server supports three backends: faster_whisper for general use, NVIDIA's TensorRT-LLM for optimized GPU inference, and OpenVINO for Intel hardware acceleration. The server can dynamically load different Whisper model sizes per client or use a pre-loaded custom model for all clients. The client connects to the server to send audio data and receive transcribed text.

Quick Start & Requirements

Installation: pip install whisper-live or use Docker images.
Backends: Requires specific setup for TensorRT (see TensorRT_whisper.md) and OpenVINO.
Dependencies: PyAudio, faster-whisper, TensorRT-LLM, or OpenVINO runtime. GPU with CUDA 11+ is recommended for TensorRT.
Running Server (Faster Whisper): python3 run_server.py --port 9090 --backend faster_whisper
Running Client: from whisper_live.client import TranscriptionClient; client = TranscriptionClient("localhost", 9090); client("audio.wav") or client() for microphone.
Docker: docker run -it --gpus all -p 9090:9090 ghcr.io/collabora/whisperlive-gpu:latest

Highlighted Details

Supports faster_whisper, TensorRT, and OpenVINO backends for flexible performance.
Can transcribe live microphone input, audio files, RTSP, and HLS streams.
Offers optional Voice Activity Detection (VAD) and audio recording.
Browser extensions are available for direct browser-based transcription.

Maintenance & Community

Developed by Collabora. Contact available via their website or provided email addresses.

Licensing & Compatibility

The project appears to be Apache 2.0 licensed, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

The TensorRT backend setup is complex and requires building custom engines, with Docker recommended for easier deployment. Native OpenVINO setup requires manual installation of Intel drivers and runtime. The default server behavior of loading a new model per client can increase VRAM usage and initial connection latency.

Health Check

Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

67 stars in the last 30 days