type4me  by joewongjc

macOS voice input with local and LLM-powered optimization

Created 2 weeks ago

New!

879 stars

Top 40.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Type4Me addresses the limitations of existing macOS voice input solutions by offering a flexible, privacy-focused tool. It targets power users and developers seeking efficient, customizable speech-to-text and command execution, benefiting from local processing, LLM integration, and complete data control.

How It Works

The core architecture supports both on-device (SherpaOnnx - Paraformer/Zipformer) and cloud-based (Volcengine, Deepgram) Automatic Speech Recognition (ASR). A key differentiator is its LLM integration, enabling advanced text optimization, translation, and command execution via customizable prompt templates utilizing context variables like {text}, {selected}, and {clipboard}. Its plugin-based ASR provider design facilitates extensibility, while all user data, including credentials and history, is stored locally.

Quick Start & Requirements

  • Installation: Download the DMG for cloud-only use, or build from source for full local ASR capabilities.
  • Prerequisites: macOS 14.0+ required. Xcode Command Line Tools and CMake (brew install cmake) are necessary for source builds.
  • Setup: Building the local Sherpa engine takes approximately 5 minutes. ASR models range from 20MB to 1GB and require manual download and placement.
  • Resources: Links to model downloads (e.g., Sherpa-onnx releases) are provided.

Highlighted Details

  • On-Device ASR: Leverages SherpaOnnx for fully offline, local speech recognition, eliminating API key requirements and network dependency, optimized for Apple Silicon.
  • LLM Command Execution: Transforms voice input into LLM commands using prompt templates with dynamic context ({text}, {selected}, {clipboard}), enabling actions based on recognized speech and selected text/clipboard content.
  • Data Privacy & Control: All credentials and recognition history are stored locally (SQLite, JSON) with no telemetry or cloud synchronization; CSV export is supported.
  • Customizable Modes & Hotkeys: Offers multiple processing modes (e.g., Fast, Performance, Translation, Command) configurable with independent global hotkeys and input methods ("press-and-hold" or "toggle").
  • ASR Enhancements: Includes ASR hotword support for specialized terms and snippet replacement for custom voice shortcuts.

Maintenance & Community

The project encourages community contributions, particularly for adding support for additional ASR cloud providers. While specific community links (Discord, Slack) are absent, the README outlines contribution steps via Issues, Discussions, and Pull Requests, and mentions AI agent integration for development tasks.

Licensing & Compatibility

Licensed under the permissive MIT License, Type4Me is compatible with commercial use and closed-source linking. It requires macOS 14.0 or newer.

Limitations & Caveats

Local ASR setup involves manual model downloading and configuration. While the architecture supports numerous cloud ASR providers, only Volcengine and Deepgram currently have implemented client integrations. The application requires user intervention to bypass macOS security warnings on first launch.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
45
Issues (30d)
64
Star History
886 stars in the last 17 days

Explore Similar Projects

Feedback? Help us improve.