Python package for real-time, local speech-to-text
Top 72.6% on sourcepulse
WhisperLiveKit provides a fully local, real-time speech-to-text and speaker diarization solution. It's designed for developers and researchers needing to integrate live transcription into applications, offering a FastAPI backend, a customizable web interface, and robust local processing capabilities. The primary benefit is enabling private, on-device transcription without external API dependencies.
How It Works
The system comprises a frontend for audio capture and streaming via WebSockets, a FastAPI backend to manage these connections and process audio, and a server-agnostic core library. Audio is captured using the browser's MediaRecorder API, streamed to the server, decoded with FFmpeg, and fed into Whisper for transcription. It supports automatic silence chunking, multi-user handling, and optional confidence validation for faster inference.
Quick Start & Requirements
pip install whisperlivekit
apt
, brew
, or download).torch
, mosestokenizer
, wtpsplit
, diart
, and various Whisper backends (whisperlivekit[whisper]
, whisperlivekit[mlx-whisper]
, etc.).whisperlivekit-server --model tiny.en
http://localhost:8000
Highlighted Details
Maintenance & Community
The project is actively maintained by QuentinFuxa. Contributions are welcomed via pull requests. Links to the GitHub repository and issue tracker are provided.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration into closed-source projects.
Limitations & Caveats
Performance heavily relies on local hardware, especially for larger models or CPU-only execution. Docker deployment requires careful configuration for GPU access and memory allocation. Pyannote diarization requires manual HuggingFace model acceptance and login.
1 day ago
Inactive