realtime-transcription-fastrtc by sofdog-gh

Real-time transcription tool using local Whisper models

Created 10 months ago

694 stars

Top 49.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jong Wook Kim

Research Scientist at OpenAI

Project Summary

This project provides real-time speech transcription using the FastRTC framework for audio streaming and local Hugging Face Transformers models, primarily Whisper. It's designed for developers and researchers needing efficient, on-device speech-to-text capabilities with customizable ASR models and streaming parameters.

How It Works

The system leverages FastRTC to manage live audio streams, including features like Voice Activity Detection (VAD). It integrates with Hugging Face Transformers to run various Automatic Speech Recognition (ASR) models locally. The architecture prioritizes real-time performance by configuring ASR models for a batch size of 1, processing audio chunks as they become available.

Quick Start & Requirements

Install via uv (recommended) or pip:

uv venv --python 3.11 && source .venv/bin/activate
uv pip install -r requirements.txt

python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Prerequisites: Python >= 3.10, ffmpeg (install via brew on macOS or apt on Debian/Ubuntu).
Configuration: Create a .env file with UI_MODE (e.g., fastapi, gradio), APP_MODE (local or deployed), MODEL_ID (e.g., openai/whisper-large-v3-turbo), SERVER_NAME, and PORT.
Launch: python main.py
Documentation: FastRTC documentation (Note: This link appears to be for uv, not FastRTC itself. The README implies FastRTC docs exist but doesn't link them.)

Highlighted Details

Supports local Whisper models (e.g., openai/whisper-large-v3-turbo) for multi-lingual transcription.
Configurable ASR parameters for real-time performance, including batch size and target language.
Offers flexible UI modes (fastapi or gradio).

Maintenance & Community

No specific details on contributors, sponsorships, or community channels (Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires local installation of ffmpeg and Python 3.10+. The README mentions potential configuration for deployed environments requiring a Turn Server, but detailed instructions are linked externally and may require separate setup. The FastRTC documentation link provided appears to be for uv installation, not FastRTC itself.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days