Discover and explore top open-source AI tools and projects—updated daily.
QuentinFuxaPython package for real-time, local speech-to-text
Top 6.3% on SourcePulse
WhisperLiveKit provides a fully local, real-time speech-to-text and speaker diarization solution. It's designed for developers and researchers needing to integrate live transcription into applications, offering a FastAPI backend, a customizable web interface, and robust local processing capabilities. The primary benefit is enabling private, on-device transcription without external API dependencies.
How It Works
The system comprises a frontend for audio capture and streaming via WebSockets, a FastAPI backend to manage these connections and process audio, and a server-agnostic core library. Audio is captured using the browser's MediaRecorder API, streamed to the server, decoded with FFmpeg, and fed into Whisper for transcription. It supports automatic silence chunking, multi-user handling, and optional confidence validation for faster inference.
Quick Start & Requirements
pip install whisperlivekitapt, brew, or download).torch, mosestokenizer, wtpsplit, diart, and various Whisper backends (whisperlivekit[whisper], whisperlivekit[mlx-whisper], etc.).whisperlivekit-server --model tiny.enhttp://localhost:8000Highlighted Details
Maintenance & Community
The project is actively maintained by QuentinFuxa. Contributions are welcomed via pull requests. Links to the GitHub repository and issue tracker are provided.
Licensing & Compatibility
Licensed under the MIT License, permitting commercial use and integration into closed-source projects.
Limitations & Caveats
Performance heavily relies on local hardware, especially for larger models or CPU-only execution. Docker deployment requires careful configuration for GPU access and memory allocation. Pyannote diarization requires manual HuggingFace model acceptance and login.
4 days ago
1 day
Uberi