Offline speech-to-text tool for local audio/video transcription
Top 13.5% on sourcepulse
This tool provides an offline, local voice recognition service that converts audio/video into text, with support for JSON, SRT, and plain text output formats. It's designed for users needing to self-host a speech-to-text solution, offering accuracy comparable to OpenAI's API, and is particularly useful for developers and researchers working with audio data.
How It Works
The project leverages the faster-whisper
open-source model, known for its efficiency and accuracy. It supports various model sizes (tiny to large-v3), allowing users to balance performance with computational resource requirements. The tool operates as a local web service, accessible via a browser interface or an API, and automatically utilizes NVIDIA GPU acceleration via CUDA if configured.
Quick Start & Requirements
pip install -r requirements.txt
).ffmpeg.exe
and ffprobe.exe
to the project directory). NVIDIA GPU with CUDA 11.x/12.x toolkit and cuDNN for GPU acceleration.models
directory.python start.py
to launch the local web UI.Highlighted Details
Maintenance & Community
faster-whisper
, Flask, and FFmpeg.Licensing & Compatibility
Limitations & Caveats
8 months ago
1 day