whisper-playground by saharmor

Real-time speech-to-text web app using Whisper

Created 3 years ago

819 stars

Top 43.3% on SourcePulse

Project Summary

This project provides a web-based playground for building real-time speech-to-text applications using OpenAI's Whisper model, enhanced with Diart and Pyannote for speaker diarization. It targets developers and researchers looking to quickly prototype and deploy multilingual speech transcription and speaker identification features.

How It Works

The playground leverages the faster-whisper library for efficient transcription, diart for real-time voice activity detection, and pyannote.audio for speaker embedding and diarization. This combination allows for low-latency, multi-language speech processing and speaker segmentation within a web application framework.

Quick Start & Requirements

Install via install_playground.sh.
Requires Conda and Yarn.
Backend: cd backend && python server.py.
Frontend: cd interface && yarn start.
Access to Hugging Face Hub models (pyannote/segmentation, pyannote/embedding, pyannote/speaker-diarization) requires accepting terms and logging in via huggingface-cli.
For macOS users experiencing build issues with safetensors, install Rust via brew install rust.

Highlighted Details

Supports real-time and sequential transcription modes.
Configurable parameters include model size, language, transcription timeout, and beam size.
Enables speaker diarization for identifying different speakers in audio.
Offers a web interface for easy interaction and deployment.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The repository and its code/model weights are released under the MIT License. This license permits commercial use and integration into closed-source projects.

Limitations & Caveats

Known bugs include potential uncontrolled speaker swapping in sequential mode and failure to transcribe audio not meeting the transcription timeout in real-time mode. The project has not been tested for all languages.

whisper-playground by saharmor

Explore Similar Projects

S.A.T.U.R.D.A.Y by GRVYDEV

echogarden by echogarden-project

LiveWhisper by Nikorasu

AIVoiceChat by KoljaB

chatgpt-conversation by platelminto

Open-VoiceCanvas by ItusiAI

Scriberr by rishikanthc

use-whisper by chengsokdara

LanguageLeapAI by SociallyIneptWeeb

Whisper-WebUI by jhj0517

voice-pro by abus-aikorea

WhisperLiveKit by QuentinFuxa