WhisperLiveKit  by QuentinFuxa

Python package for real-time, local speech-to-text

created 7 months ago
407 stars

Top 72.6% on sourcepulse

GitHubView on GitHub
Project Summary

WhisperLiveKit provides a fully local, real-time speech-to-text and speaker diarization solution. It's designed for developers and researchers needing to integrate live transcription into applications, offering a FastAPI backend, a customizable web interface, and robust local processing capabilities. The primary benefit is enabling private, on-device transcription without external API dependencies.

How It Works

The system comprises a frontend for audio capture and streaming via WebSockets, a FastAPI backend to manage these connections and process audio, and a server-agnostic core library. Audio is captured using the browser's MediaRecorder API, streamed to the server, decoded with FFmpeg, and fed into Whisper for transcription. It supports automatic silence chunking, multi-user handling, and optional confidence validation for faster inference.

Quick Start & Requirements

  • Install via pip: pip install whisperlivekit
  • System dependency: FFmpeg (installable via apt, brew, or download).
  • Optional dependencies for enhanced features: torch, mosestokenizer, wtpsplit, diart, and various Whisper backends (whisperlivekit[whisper], whisperlivekit[mlx-whisper], etc.).
  • For diarization, HuggingFace Hub login and acceptance of pyannote.audio model conditions are required.
  • Start server: whisperlivekit-server --model tiny.en
  • Access web interface: http://localhost:8000
  • Official Docs: https://github.com/QuentinFuxa/WhisperLiveKit

Highlighted Details

  • Real-time transcription and speaker diarization.
  • Fully local processing, ensuring data privacy.
  • Multi-user support via WebSocket handling.
  • Optimized backends including MLX Whisper for Apple Silicon.
  • Confidence validation and sentence-based buffer trimming options.

Maintenance & Community

The project is actively maintained by QuentinFuxa. Contributions are welcomed via pull requests. Links to the GitHub repository and issue tracker are provided.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

Performance heavily relies on local hardware, especially for larger models or CPU-only execution. Docker deployment requires careful configuration for GPU access and memory allocation. Pyannote diarization requires manual HuggingFace model acceptance and login.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
5
Star History
183 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.