Real-time transcription app using OpenAI's Whisper
Top 15.5% on sourcepulse
WhisperLive provides a near real-time speech-to-text transcription service, leveraging OpenAI's Whisper model. It's designed for developers and users needing to transcribe live audio streams or pre-recorded files efficiently, offering multiple backend options for performance tuning.
How It Works
WhisperLive operates with a client-server architecture. The server supports three backends: faster_whisper
for general use, NVIDIA's TensorRT-LLM for optimized GPU inference, and OpenVINO for Intel hardware acceleration. The server can dynamically load different Whisper model sizes per client or use a pre-loaded custom model for all clients. The client connects to the server to send audio data and receive transcribed text.
Quick Start & Requirements
pip install whisper-live
or use Docker images.TensorRT_whisper.md
) and OpenVINO.faster-whisper
, TensorRT-LLM, or OpenVINO runtime. GPU with CUDA 11+ is recommended for TensorRT.python3 run_server.py --port 9090 --backend faster_whisper
from whisper_live.client import TranscriptionClient; client = TranscriptionClient("localhost", 9090); client("audio.wav")
or client()
for microphone.docker run -it --gpus all -p 9090:9090 ghcr.io/collabora/whisperlive-gpu:latest
Highlighted Details
faster_whisper
, TensorRT, and OpenVINO backends for flexible performance.Maintenance & Community
Developed by Collabora. Contact available via their website or provided email addresses.
Licensing & Compatibility
The project appears to be Apache 2.0 licensed, allowing for commercial use and integration into closed-source projects.
Limitations & Caveats
The TensorRT backend setup is complex and requires building custom engines, with Docker recommended for easier deployment. Native OpenVINO setup requires manual installation of Intel drivers and runtime. The default server behavior of loading a new model per client can increase VRAM usage and initial connection latency.
1 week ago
1 week