voxt  by hehehai

AI voice input and translation for macOS

Created 2 weeks ago

New!

306 stars

Top 87.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Voxt is a beta macOS menu bar application designed for seamless voice input and translation. It targets macOS users seeking an efficient way to convert speech to text or translate it directly within any application, offering a "press to talk, release to paste" paradigm. The primary benefit is the integration of powerful on-device and cloud-based STT and LLM engines, providing both privacy and flexibility for users.

How It Works

Voxt leverages macOS's Accessibility and Event Tapping APIs to capture global hotkey triggers and audio input via AVAudioEngine. It supports two Speech-to-Text (STT) engines: MLX Audio for private, on-device processing with downloadable models, and Apple's Direct Dictation for zero-setup convenience. For text enhancement and translation, it integrates with Apple Intelligence Foundation Models or allows users to run custom local LLMs. The pipeline involves ASR, optional LLM enhancement (which can be contextually routed via App Branch rules), and a final translation step, all while maintaining a live floating overlay for user feedback.

Quick Start & Requirements

  • Installation: Download the latest .zip release, unzip, and drag Voxt.app to the Applications folder.
  • Prerequisites: macOS 26.0+, Microphone permission, Accessibility permission, Speech Recognition permission (for Direct Dictation).
  • Build: Open Voxt.xcodeproj in Xcode or use xcodebuild from the terminal.
  • Links: Releases: https://github.com/hehehai/voxt/releases/latest

Highlighted Details

  • Global hotkey support for transcription and translation with configurable "Long Press" or "Tap" trigger modes.
  • "Selected-text direct translation" feature translates highlighted text and replaces it in-place.
  • Choice between on-device MLX STT/LLM models for privacy or Apple's Direct Dictation and Foundation Models for ease of use.
  • Extensive model selection for both STT (e.g., Qwen3-ASR, Parakeet) and LLM (e.g., Qwen2, Llama 3.2, Mistral 7B) with varying performance and resource trade-offs.
  • App Branch enhancement rules allow per-app or URL-specific prompt routing for LLM interactions.
  • Features a live floating overlay, local transcription history, and clipboard-safe paste functionality.

Maintenance & Community

No specific community channels (like Discord/Slack) or details on maintainers/sponsors are provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license is permissive and generally compatible with commercial use and linking within closed-source applications.

Limitations & Caveats

The application is explicitly marked as "[Beta]". It requires significant macOS permissions (Accessibility, Microphone, Speech Recognition), which may be a privacy concern for some users. Performance and functionality of local models depend heavily on the user's hardware capabilities.

Health Check
Last Commit

22 hours ago

Responsiveness

Inactive

Pull Requests (30d)
9
Issues (30d)
4
Star History
309 stars in the last 16 days

Explore Similar Projects

Feedback? Help us improve.