RealtimeSTT  by KoljaB

Speech-to-text library for realtime applications

created 1 year ago
8,243 stars

Top 6.4% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides a robust, low-latency speech-to-text (STT) solution for real-time applications, featuring voice activity detection (VAD) and wake word activation. It's designed for voice assistants and applications requiring fast, accurate speech-to-text conversion, offering an easy-to-use interface for developers.

How It Works

RealtimeSTT leverages a multi-component architecture for efficient processing. Voice Activity Detection is handled by a combination of WebRTCVAD for initial detection and SileroVAD for enhanced accuracy. Speech-to-text transcription is powered by Faster-Whisper, known for its GPU-accelerated, real-time performance. Wake word detection is supported by either Porcupine or OpenWakeWord, providing flexibility in activation methods.

Quick Start & Requirements

  • Installation: pip install RealtimeSTT
  • Prerequisites:
    • Python 3.x
    • Recommended: NVIDIA GPU with CUDA 11.8 or 12.X installed for optimal performance.
    • Linux: sudo apt-get update && sudo apt-get install python3-dev portaudio19-dev
    • macOS: brew install portaudio
  • GPU Support: Requires manual PyTorch installation for CUDA versions (e.g., pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118). Full CUDA setup involves installing the NVIDIA CUDA Toolkit and cuDNN.
  • Docs: https://github.com/KoljaB/RealtimeSTT

Highlighted Details

  • Supports multiple STT models (tiny to large-v2) and language auto-detection.
  • Offers real-time transcription with optional separate models for enhanced responsiveness.
  • Provides callbacks for various events (recording start/stop, VAD start/stop, wake word detection).
  • Includes wake word support with customizable sensitivity and backends (Porcupine, OpenWakeWord).

Maintenance & Community

  • Active development with recent updates (v0.3.100).
  • Contributions are welcome; Docker support provided by Steven Linn.
  • Links to related projects like Linguflex and RealtimeTTS are available.

Licensing & Compatibility

  • License: MIT
  • Compatible with commercial use and closed-source applications.

Limitations & Caveats

  • The server component does not yet handle concurrent requests.
  • Real-time transcription with the main model can create high GPU loads.
  • A mismatch between ctranslate2 and cuDNN versions can cause loading errors, requiring downgrades or upgrades.
Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
7
Star History
1,398 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.