WhisperLiveKit by QuentinFuxa

Python package for real-time, local speech-to-text

Created 1 year ago

9,424 stars

Top 5.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Tim J. Baek

Founder of Open WebUI

Project Summary

WhisperLiveKit provides a fully local, real-time speech-to-text and speaker diarization solution. It's designed for developers and researchers needing to integrate live transcription into applications, offering a FastAPI backend, a customizable web interface, and robust local processing capabilities. The primary benefit is enabling private, on-device transcription without external API dependencies.

How It Works

The system comprises a frontend for audio capture and streaming via WebSockets, a FastAPI backend to manage these connections and process audio, and a server-agnostic core library. Audio is captured using the browser's MediaRecorder API, streamed to the server, decoded with FFmpeg, and fed into Whisper for transcription. It supports automatic silence chunking, multi-user handling, and optional confidence validation for faster inference.

Quick Start & Requirements

Install via pip: pip install whisperlivekit
System dependency: FFmpeg (installable via apt, brew, or download).
Optional dependencies for enhanced features: torch, mosestokenizer, wtpsplit, diart, and various Whisper backends (whisperlivekit[whisper], whisperlivekit[mlx-whisper], etc.).
For diarization, HuggingFace Hub login and acceptance of pyannote.audio model conditions are required.
Start server: whisperlivekit-server --model tiny.en
Access web interface: http://localhost:8000
Official Docs: https://github.com/QuentinFuxa/WhisperLiveKit

Highlighted Details

Real-time transcription and speaker diarization.
Fully local processing, ensuring data privacy.
Multi-user support via WebSocket handling.
Optimized backends including MLX Whisper for Apple Silicon.
Confidence validation and sentence-based buffer trimming options.

Maintenance & Community

The project is actively maintained by QuentinFuxa. Contributions are welcomed via pull requests. Links to the GitHub repository and issue tracker are provided.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

Performance heavily relies on local hardware, especially for larger models or CPU-only execution. Docker deployment requires careful configuration for GPU access and memory allocation. Pyannote diarization requires manual HuggingFace model acceptance and login.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

275 stars in the last 30 days