voxtype  by peteonrails

Linux voice-to-text with push-to-talk

Created 2 months ago
405 stars

Top 71.9% on SourcePulse

GitHubView on GitHub
Project Summary

A voice-to-text utility designed for Linux desktop environments, Voxtype offers push-to-talk functionality optimized for Wayland compositors while maintaining compatibility with X11. It targets users seeking efficient, hands-free transcription integrated directly into their workflow, providing an offline-first solution powered by local models.

How It Works

Voxtype utilizes whisper.cpp for local, offline speech-to-text transcription, ensuring privacy and reducing reliance on external services. It integrates seamlessly with Wayland compositors like Hyprland, Sway, and River through native keybinding support, allowing for precise control over recording start and stop events. For X11 and other environments, it falls back to an evdev hotkey mechanism. Text output is managed through a chain of backends: wtype is preferred for Wayland due to its robust CJK and Unicode support, followed by dotool (offering XKB layout awareness), ydotool, and finally clipboard fallback.

Quick Start & Requirements

Build the release binary using cargo build --release. Install a typing backend such as wtype (e.g., sudo apt install wtype). Download a Whisper model via ./target/release/voxtype setup --download. Configure compositor-specific keybindings or, for X11/evdev fallback, add your user to the input group (sudo usermod -aG input $USER) and log out/in. Run the daemon with ./target/release/voxtype.

Key dependencies include Rust, wtype, dotool, ydotool, wl-clipboard, PipeWire/PulseAudio, and Linux glibc 2.38+. GPU acceleration is supported via Vulkan (runtime installation required) or by building from source with specific feature flags (gpu-cuda, gpu-metal, gpu-hipblas).

Highlighted Details

  • Fully offline transcription using local whisper.cpp models.
  • Flexible input modes: push-to-talk (hold key) or toggle (press to start/stop).
  • Advanced output options including wtype, dotool, ydotool, and clipboard, with support for post-processing transcriptions via external commands (e.g., local LLMs for grammar correction).
  • Optional Waybar integration for status monitoring.
  • Configurable audio feedback cues for recording events.
  • Optional GPU acceleration for significantly faster inference.

Maintenance & Community

The project is maintained by Peter Jackson, with contributions listed from several individuals including jvantillo, materemias, and Dan Heuckeroth, indicating active development and community engagement. No specific community channels like Discord or Slack are mentioned.

Licensing & Compatibility

Voxtype is released under the permissive MIT License, allowing for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

On desktop environments like KDE Plasma or GNOME, wtype may not function, causing Voxtype to fall back to dotool or ydotool. Complex multi-modifier keybindings might require specific configuration to prevent interference with window manager shortcuts. The built-in evdev hotkey requires membership in the input group for access to input devices.

Health Check
Last Commit

20 hours ago

Responsiveness

Inactive

Pull Requests (30d)
72
Issues (30d)
40
Star History
157 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jinze Bai Jinze Bai(Research Scientist at Alibaba Qwen), and
4 more.

self-operating-computer by OthersideAI

0.1%
10k
Framework for multimodal computer operation
Created 2 years ago
Updated 5 months ago
Feedback? Help us improve.