Whisper-Input by ErlichLiu

CLI tool for voice transcription using Groq or SiliconFlow

Created 1 year ago

588 stars

Top 55.2% on SourcePulse

Project Summary

This project provides a macOS-native utility for hands-free speech-to-text transcription, leveraging advanced models for rapid and accurate conversion. It's designed for users who rely heavily on voice input, including those with visual impairments, offering a seamless way to dictate text by simply holding a key.

How It Works

The tool captures audio when a designated key (Option/Alt) is pressed and stops upon release. It then sends this audio to either Groq's Whisper Large V3 Turbo or SiliconFlow's SenseVoiceSmall models for transcription. The choice between models allows users to prioritize speed (Groq) or accuracy and punctuation (SiliconFlow), both offered with free usage tiers.

Quick Start & Requirements

Install: Clone the repository and set up a Python virtual environment.
Prerequisites: Python 3.10+ (3.12.5 recommended; 3.13.1 has known issues).
Configuration:
- Copy .env.example to .env.
- Obtain and paste an API key from either Groq (https://console.groq.com/keys) or SiliconFlow (https://cloud.siliconflow.cn/account/ak).
- Set SERVICE_PLATFORM to groq or siliconflow in .env.
Dependencies: pip install pip-tools, pip-compile requirements.in, pip install -r requirements.txt.
Run: python main.py.
Docs: https://erlich.fun

Highlighted Details

Supports real-time transcription with feedback in 1-2 seconds via Groq.
Offers SenseVoiceSmall for potentially faster and more accurate results with built-in punctuation.
Includes a feature to translate transcribed text from Chinese to English.
Actively developing a macOS client with a focus on accessibility.

Maintenance & Community

The project is actively maintained, with recent updates in January 2025 adding support for SiliconFlow, Windows compatibility, and various input/output options. The author welcomes contributions via Fork and PR, and issues can be submitted for problems. Contact available via WeChat for Windows client development interest.

Licensing & Compatibility

The project is released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The project is primarily focused on macOS, with Windows support added recently. A known issue exists with Python 3.13.1 regarding cursor switching. The project acknowledges the existence of a more feature-rich alternative, WhisperKeyBoard.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days