RealtimeSTT by KoljaB

Speech-to-text library for realtime applications

Created 2 years ago

9,287 stars

Top 5.5% on SourcePulse

2 Experts Love This Project

chiphuyen

Author of "AI Engineering", "Designing Machine Learning Systems"

transitive-bullshit

Founder of Agentic

Project Summary

This library provides a robust, low-latency speech-to-text (STT) solution for real-time applications, featuring voice activity detection (VAD) and wake word activation. It's designed for voice assistants and applications requiring fast, accurate speech-to-text conversion, offering an easy-to-use interface for developers.

How It Works

RealtimeSTT leverages a multi-component architecture for efficient processing. Voice Activity Detection is handled by a combination of WebRTCVAD for initial detection and SileroVAD for enhanced accuracy. Speech-to-text transcription is powered by Faster-Whisper, known for its GPU-accelerated, real-time performance. Wake word detection is supported by either Porcupine or OpenWakeWord, providing flexibility in activation methods.

Quick Start & Requirements

Installation: pip install RealtimeSTT
Prerequisites:
- Python 3.x
- Recommended: NVIDIA GPU with CUDA 11.8 or 12.X installed for optimal performance.
- Linux: sudo apt-get update && sudo apt-get install python3-dev portaudio19-dev
- macOS: brew install portaudio
GPU Support: Requires manual PyTorch installation for CUDA versions (e.g., pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118). Full CUDA setup involves installing the NVIDIA CUDA Toolkit and cuDNN.
Docs: https://github.com/KoljaB/RealtimeSTT

Highlighted Details

Supports multiple STT models (tiny to large-v2) and language auto-detection.
Offers real-time transcription with optional separate models for enhanced responsiveness.
Provides callbacks for various events (recording start/stop, VAD start/stop, wake word detection).
Includes wake word support with customizable sensitivity and backends (Porcupine, OpenWakeWord).

Maintenance & Community

Active development with recent updates (v0.3.100).
Contributions are welcome; Docker support provided by Steven Linn.
Links to related projects like Linguflex and RealtimeTTS are available.

Licensing & Compatibility

License: MIT
Compatible with commercial use and closed-source applications.

Limitations & Caveats

The server component does not yet handle concurrent requests.
Real-time transcription with the main model can create high GPU loads.
A mismatch between ctranslate2 and cuDNN versions can cause loading errors, requiring downgrades or upgrades.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

307 stars in the last 30 days

Explore Similar Projects

susi_translator by susiai

Real-time audio transcription system

Created 8 years ago

Updated 1 year ago

Transcribro by soupslurpr

Android app for private, on-device speech recognition

Created 1 year ago

Updated 4 months ago

LiveWhisper by Nikorasu

Live transcription tool using OpenAI's Whisper

Created 3 years ago

Updated 5 months ago

RealtimeSTT_LLM_TTS by Ikaros-521

Realtime STT/TTS pipeline for cross-network, real-time conversations

Created 1 year ago

Updated 1 year ago

AIVoiceChat by KoljaB

Voice chat for low-latency AI companion interaction

Created 2 years ago

Updated 6 months ago

Starred by

Jong Wook Kim

Jong Wook Kim(Research Scientist at OpenAI).

realtime-transcription-fastrtc by sofdog-gh

Real-time transcription tool using local Whisper models

Created 10 months ago

Updated 6 months ago

speech-to-text by reriiasu

Real-time transcription tool using faster-whisper

Created 2 years ago

Updated 1 year ago

whisper_mic by mallorbc

Microphone interface for OpenAI's Whisper speech-to-text model

Created 3 years ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

mini-omni by gpt-omni

Open-source multimodal LLM for real-time speech interaction

Created 1 year ago

Updated 1 year ago

Starred by

Amin Ahmad

Amin Ahmad(Cofounder of Vectara) and

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai).

whisper_streaming by ufal

Real-time streaming for long speech-to-text transcription/translation

Created 2 years ago

Updated 2 months ago

Starred by

Pietro Schirano

Pietro Schirano(Founder of MagicPath),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

1 more.

whisper_real_time by davabase

Demo for real-time speech-to-text using OpenAI's Whisper

Created 3 years ago

Updated 9 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Matt Schrage

Matt Schrage(Cofounder of Fig).

WhisperLive by collabora

Real-time transcription app using OpenAI's Whisper

Created 2 years ago

Updated 3 months ago

Feedback? Help us improve.