realtime-transcription-fastrtc  by sofi444

Real-time transcription tool using local Whisper models

created 5 months ago
670 stars

Top 51.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides real-time speech transcription using the FastRTC framework for audio streaming and local Hugging Face Transformers models, primarily Whisper. It's designed for developers and researchers needing efficient, on-device speech-to-text capabilities with customizable ASR models and streaming parameters.

How It Works

The system leverages FastRTC to manage live audio streams, including features like Voice Activity Detection (VAD). It integrates with Hugging Face Transformers to run various Automatic Speech Recognition (ASR) models locally. The architecture prioritizes real-time performance by configuring ASR models for a batch size of 1, processing audio chunks as they become available.

Quick Start & Requirements

  • Install via uv (recommended) or pip:
    uv venv --python 3.11 && source .venv/bin/activate
    uv pip install -r requirements.txt
    
    or
    python -m venv .venv && source .venv/bin/activate
    pip install --upgrade pip
    pip install -r requirements.txt
    
  • Prerequisites: Python >= 3.10, ffmpeg (install via brew on macOS or apt on Debian/Ubuntu).
  • Configuration: Create a .env file with UI_MODE (e.g., fastapi, gradio), APP_MODE (local or deployed), MODEL_ID (e.g., openai/whisper-large-v3-turbo), SERVER_NAME, and PORT.
  • Launch: python main.py
  • Documentation: FastRTC documentation (Note: This link appears to be for uv, not FastRTC itself. The README implies FastRTC docs exist but doesn't link them.)

Highlighted Details

  • Supports local Whisper models (e.g., openai/whisper-large-v3-turbo) for multi-lingual transcription.
  • Configurable ASR parameters for real-time performance, including batch size and target language.
  • Offers flexible UI modes (fastapi or gradio).

Maintenance & Community

No specific details on contributors, sponsorships, or community channels (Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires local installation of ffmpeg and Python 3.10+. The README mentions potential configuration for deployed environments requiring a Turn Server, but detailed instructions are linked externally and may require separate setup. The FastRTC documentation link provided appears to be for uv installation, not FastRTC itself.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
35 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.9%
8k
Speech-to-text library for realtime applications
created 1 year ago
updated 3 weeks ago
Feedback? Help us improve.