voxtype by peteonrails

Linux voice-to-text with push-to-talk

Created 7 months ago

926 stars

Top 38.7% on SourcePulse

Project Summary

A voice-to-text utility designed for Linux desktop environments, Voxtype offers push-to-talk functionality optimized for Wayland compositors while maintaining compatibility with X11. It targets users seeking efficient, hands-free transcription integrated directly into their workflow, providing an offline-first solution powered by local models.

How It Works

Voxtype utilizes whisper.cpp for local, offline speech-to-text transcription, ensuring privacy and reducing reliance on external services. It integrates seamlessly with Wayland compositors like Hyprland, Sway, and River through native keybinding support, allowing for precise control over recording start and stop events. For X11 and other environments, it falls back to an evdev hotkey mechanism. Text output is managed through a chain of backends: wtype is preferred for Wayland due to its robust CJK and Unicode support, followed by dotool (offering XKB layout awareness), ydotool, and finally clipboard fallback.

Quick Start & Requirements

Build the release binary using cargo build --release. Install a typing backend such as wtype (e.g., sudo apt install wtype). Download a Whisper model via ./target/release/voxtype setup --download. Configure compositor-specific keybindings or, for X11/evdev fallback, add your user to the input group (sudo usermod -aG input $USER) and log out/in. Run the daemon with ./target/release/voxtype.

Key dependencies include Rust, wtype, dotool, ydotool, wl-clipboard, PipeWire/PulseAudio, and Linux glibc 2.38+. GPU acceleration is supported via Vulkan (runtime installation required) or by building from source with specific feature flags (gpu-cuda, gpu-metal, gpu-hipblas).

Highlighted Details

Fully offline transcription using local whisper.cpp models.
Flexible input modes: push-to-talk (hold key) or toggle (press to start/stop).
Advanced output options including wtype, dotool, ydotool, and clipboard, with support for post-processing transcriptions via external commands (e.g., local LLMs for grammar correction).
Optional Waybar integration for status monitoring.
Configurable audio feedback cues for recording events.
Optional GPU acceleration for significantly faster inference.

Maintenance & Community

The project is maintained by Peter Jackson, with contributions listed from several individuals including jvantillo, materemias, and Dan Heuckeroth, indicating active development and community engagement. No specific community channels like Discord or Slack are mentioned.

Licensing & Compatibility

Voxtype is released under the permissive MIT License, allowing for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

On desktop environments like KDE Plasma or GNOME, wtype may not function, causing Voxtype to fall back to dotool or ydotool. Complex multi-modifier keybindings might require specific configuration to prevent interference with window manager shortcuts. The built-in evdev hotkey requires membership in the input group for access to input devices.

voxtype by peteonrails

Explore Similar Projects

whispering by braden-w

fcitx5-vinput by xifan2333

voxd by jakovius

On-Device-Speech-to-Speech-Conversational-AI by asiff00

voxt by hehehai

transcribe by vivekuppal

opentypeless by tover0314-w

vocalinux by jatinkrmalik

hyprwhspr by goodroot

ChatdollKit by uezo

Android-MVVM-Architecture-Android-Voice-AI-SDK by ahmedeltaher

self-operating-computer by OthersideAI