Real-time speech-to-text web app using Whisper
Top 44.3% on sourcepulse
This project provides a web-based playground for building real-time speech-to-text applications using OpenAI's Whisper model, enhanced with Diart and Pyannote for speaker diarization. It targets developers and researchers looking to quickly prototype and deploy multilingual speech transcription and speaker identification features.
How It Works
The playground leverages the faster-whisper
library for efficient transcription, diart
for real-time voice activity detection, and pyannote.audio
for speaker embedding and diarization. This combination allows for low-latency, multi-language speech processing and speaker segmentation within a web application framework.
Quick Start & Requirements
install_playground.sh
.cd backend && python server.py
.cd interface && yarn start
.pyannote/segmentation
, pyannote/embedding
, pyannote/speaker-diarization
) requires accepting terms and logging in via huggingface-cli
.safetensors
, install Rust via brew install rust
.Highlighted Details
Maintenance & Community
No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.
Licensing & Compatibility
The repository and its code/model weights are released under the MIT License. This license permits commercial use and integration into closed-source projects.
Limitations & Caveats
Known bugs include potential uncontrolled speaker swapping in sequential mode and failure to transcribe audio not meeting the transcription timeout in real-time mode. The project has not been tested for all languages.
1 year ago
1 week