speech-to-text by reriiasu

Real-time transcription tool using faster-whisper

Created 2 years ago

611 stars

Top 53.7% on SourcePulse

Project Summary

This project provides a real-time speech-to-text transcription system leveraging the Faster-Whisper model and Silero VAD for efficient audio processing. It targets users needing accurate, low-latency transcriptions, offering an HTML-based GUI for configuration and an optional OpenAI API integration for text proofreading.

How It Works

The system captures audio via microphone using sounddevice, employing Silero VAD to segment speech and discard silence. These segments are then processed by Faster-Whisper for transcription. The architecture prioritizes speed, claiming sub-second transcription for well-separated sentences, and supports advanced Faster-Whisper features like repetition_penalty and no_repeat_ngram_size.

Quick Start & Requirements

Install via pip install .
Windows users can run run.bat for automated setup and execution.
Requires Python.
GPU with CUDA (tested with CUDA 11.7 on NVIDIA GeForce RTX 3060 12GB) is recommended for optimal performance.

Highlighted Details

Supports real-time transcription from microphone input.
Enables transcription from various audio file formats (WAV, MP3, OGG).
Features audio/text synchronization with word highlighting.
Integrates OpenAI API for text proofreading.
Supports multiple Faster-Whisper models, including "large-v3" and "Faster Distil-Whisper".

Maintenance & Community

Active development with frequent updates noted in the README (e.g., support for new models, file formats, WebSocket integration).
No specific community links (Discord/Slack) or notable contributors are mentioned.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

Transcription from audio files is currently limited to WAV, MP3, and OGG formats.
OpenAI API integration requires a separate API key.
No explicit mention of support for macOS or Linux installation beyond the Windows run.bat script.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days