CLI tool for voice transcription using Groq or SiliconFlow
Top 59.3% on sourcepulse
This project provides a macOS-native utility for hands-free speech-to-text transcription, leveraging advanced models for rapid and accurate conversion. It's designed for users who rely heavily on voice input, including those with visual impairments, offering a seamless way to dictate text by simply holding a key.
How It Works
The tool captures audio when a designated key (Option/Alt) is pressed and stops upon release. It then sends this audio to either Groq's Whisper Large V3 Turbo or SiliconFlow's SenseVoiceSmall models for transcription. The choice between models allows users to prioritize speed (Groq) or accuracy and punctuation (SiliconFlow), both offered with free usage tiers.
Quick Start & Requirements
.env.example
to .env
.SERVICE_PLATFORM
to groq
or siliconflow
in .env
.pip install pip-tools
, pip-compile requirements.in
, pip install -r requirements.txt
.python main.py
.Highlighted Details
Maintenance & Community
The project is actively maintained, with recent updates in January 2025 adding support for SiliconFlow, Windows compatibility, and various input/output options. The author welcomes contributions via Fork and PR, and issues can be submitted for problems. Contact available via WeChat for Windows client development interest.
Licensing & Compatibility
The project is released under the MIT license, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The project is primarily focused on macOS, with Windows support added recently. A known issue exists with Python 3.13.1 regarding cursor switching. The project acknowledges the existence of a more feature-rich alternative, WhisperKeyBoard.
6 months ago
1 week