whisper-playground  by saharmor

Real-time speech-to-text web app using Whisper

created 2 years ago
816 stars

Top 44.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a web-based playground for building real-time speech-to-text applications using OpenAI's Whisper model, enhanced with Diart and Pyannote for speaker diarization. It targets developers and researchers looking to quickly prototype and deploy multilingual speech transcription and speaker identification features.

How It Works

The playground leverages the faster-whisper library for efficient transcription, diart for real-time voice activity detection, and pyannote.audio for speaker embedding and diarization. This combination allows for low-latency, multi-language speech processing and speaker segmentation within a web application framework.

Quick Start & Requirements

  • Install via install_playground.sh.
  • Requires Conda and Yarn.
  • Backend: cd backend && python server.py.
  • Frontend: cd interface && yarn start.
  • Access to Hugging Face Hub models (pyannote/segmentation, pyannote/embedding, pyannote/speaker-diarization) requires accepting terms and logging in via huggingface-cli.
  • For macOS users experiencing build issues with safetensors, install Rust via brew install rust.

Highlighted Details

  • Supports real-time and sequential transcription modes.
  • Configurable parameters include model size, language, transcription timeout, and beam size.
  • Enables speaker diarization for identifying different speakers in audio.
  • Offers a web interface for easy interaction and deployment.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The repository and its code/model weights are released under the MIT License. This license permits commercial use and integration into closed-source projects.

Limitations & Caveats

Known bugs include potential uncontrolled speaker swapping in sequential mode and failure to transcribe audio not meeting the transcription timeout in real-time mode. The project has not been tested for all languages.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.