Discover and explore top open-source AI tools and projects—updated daily.
Real-time subtitle display for cross-platform use
Top 91.6% on SourcePulse
Auto Caption is a cross-platform, real-time subtitle display application designed to generate captions from audio input. It targets users needing live transcription and offers flexible integration with various speech-to-text and translation engines, enhancing accessibility and communication.
How It Works
The software captures system or microphone audio, processing it through selectable speech-to-text (STT) engines: cloud-based Gummy (Alibaba), local Vosk, or local SOSV (Sherpa-ONNX SenseVoice). It supports optional translation via local Ollama LLMs or Google Translate API. Key architectural choices include modular engine support, extensive subtitle styling, and cross-platform compatibility (Windows, macOS, Linux) with multi-language UI.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project has released v1.0.0 and plans further engine development. No specific community channels, contributor details, or sponsorship information were provided in the README.
Licensing & Compatibility
License type and compatibility details are not specified in the provided README content.
Limitations & Caveats
System audio capture on macOS and Linux requires extra configuration. Vosk's recognition quality is noted as poor, lacking punctuation. Gummy's availability may be restricted outside China. Ollama translation performance depends heavily on model size, with smaller models (<1B parameters) recommended to mitigate latency and resource consumption. Google Translate API availability is region-dependent.
3 days ago
Inactive