Speech-to-text library for realtime applications
Top 6.4% on sourcepulse
This library provides a robust, low-latency speech-to-text (STT) solution for real-time applications, featuring voice activity detection (VAD) and wake word activation. It's designed for voice assistants and applications requiring fast, accurate speech-to-text conversion, offering an easy-to-use interface for developers.
How It Works
RealtimeSTT leverages a multi-component architecture for efficient processing. Voice Activity Detection is handled by a combination of WebRTCVAD for initial detection and SileroVAD for enhanced accuracy. Speech-to-text transcription is powered by Faster-Whisper, known for its GPU-accelerated, real-time performance. Wake word detection is supported by either Porcupine or OpenWakeWord, providing flexibility in activation methods.
Quick Start & Requirements
pip install RealtimeSTT
sudo apt-get update && sudo apt-get install python3-dev portaudio19-dev
brew install portaudio
pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
). Full CUDA setup involves installing the NVIDIA CUDA Toolkit and cuDNN.Highlighted Details
tiny
to large-v2
) and language auto-detection.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
ctranslate2
and cuDNN
versions can cause loading errors, requiring downgrades or upgrades.3 weeks ago
1 day