Discover and explore top open-source AI tools and projects—updated daily.
peteonrailsLinux voice-to-text with push-to-talk
Top 71.9% on SourcePulse
A voice-to-text utility designed for Linux desktop environments, Voxtype offers push-to-talk functionality optimized for Wayland compositors while maintaining compatibility with X11. It targets users seeking efficient, hands-free transcription integrated directly into their workflow, providing an offline-first solution powered by local models.
How It Works
Voxtype utilizes whisper.cpp for local, offline speech-to-text transcription, ensuring privacy and reducing reliance on external services. It integrates seamlessly with Wayland compositors like Hyprland, Sway, and River through native keybinding support, allowing for precise control over recording start and stop events. For X11 and other environments, it falls back to an evdev hotkey mechanism. Text output is managed through a chain of backends: wtype is preferred for Wayland due to its robust CJK and Unicode support, followed by dotool (offering XKB layout awareness), ydotool, and finally clipboard fallback.
Quick Start & Requirements
Build the release binary using cargo build --release. Install a typing backend such as wtype (e.g., sudo apt install wtype). Download a Whisper model via ./target/release/voxtype setup --download. Configure compositor-specific keybindings or, for X11/evdev fallback, add your user to the input group (sudo usermod -aG input $USER) and log out/in. Run the daemon with ./target/release/voxtype.
Key dependencies include Rust, wtype, dotool, ydotool, wl-clipboard, PipeWire/PulseAudio, and Linux glibc 2.38+. GPU acceleration is supported via Vulkan (runtime installation required) or by building from source with specific feature flags (gpu-cuda, gpu-metal, gpu-hipblas).
Highlighted Details
whisper.cpp models.wtype, dotool, ydotool, and clipboard, with support for post-processing transcriptions via external commands (e.g., local LLMs for grammar correction).Maintenance & Community
The project is maintained by Peter Jackson, with contributions listed from several individuals including jvantillo, materemias, and Dan Heuckeroth, indicating active development and community engagement. No specific community channels like Discord or Slack are mentioned.
Licensing & Compatibility
Voxtype is released under the permissive MIT License, allowing for commercial use and integration into closed-source projects without significant restrictions.
Limitations & Caveats
On desktop environments like KDE Plasma or GNOME, wtype may not function, causing Voxtype to fall back to dotool or ydotool. Complex multi-modifier keybindings might require specific configuration to prevent interference with window manager shortcuts. The built-in evdev hotkey requires membership in the input group for access to input devices.
20 hours ago
Inactive
OthersideAI