hyprwhspr  by goodroot

Native speech-to-text for system-wide dictation

Created 7 months ago
955 stars

Top 38.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Native speech-to-text for Arch Linux, Hyprwhspr offers fast, accurate, and private system-wide dictation. It targets users of Arch-based distributions seeking seamless voice-to-text integration, prioritizing local processing for enhanced privacy and instant performance.

How It Works

This project leverages the Arch User Repository (AUR) for straightforward installation. It defaults to local, in-memory Whisper models (via pywhispercpp or Parakeet-v3) for instant, private transcription. For enhanced flexibility, it supports cloud-based APIs (OpenAI, Groq) and custom REST endpoints. Key architectural choices include a highly customizable hotkey system, an optional themed visualizer, and advanced text replacement features for punctuation and commands.

Quick Start & Requirements

  • Install: Use an AUR helper: yay -S hyprwhspr (stable) or yay -S hyprwhspr-git (bleeding edge).
  • Prerequisites: Arch Linux-based system. Optional GPU acceleration: NVIDIA (CUDA) or AMD/Intel (Vulkan).
  • Setup: Run hyprwhspr setup auto for defaults or hyprwhspr setup for interactive configuration.
  • First Use: Log out/in, ensure microphone is active, then use the default Super+Alt+D hotkey to toggle dictation.
  • Docs: Comprehensive README serves as primary documentation.

Highlighted Details

  • Local-First Dictation: Employs local Whisper models for privacy and speed, with optional cloud integration.
  • Extensive Customization: Supports custom hotkeys, word overrides, automatic punctuation/symbol conversion, configurable paste behavior, and auto-submit options.
  • Themed Visualizer: Provides real-time voice visualization, designed to match system themes like Omarchy.
  • Flexible Backends: Integrates pywhispercpp, Parakeet-v3, OpenAI/Groq REST APIs, and experimental Realtime WebSocket streaming.

Maintenance & Community

The project is distributed via the AUR, indicating community support and maintenance. Users are encouraged to report issues via GitHub issues. No explicit community chat links (Discord/Slack) are provided.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

  • Arch-Centric: Primarily designed and optimized for Arch Linux and derivatives.
  • GPU Dependency: Larger Whisper models (large, large-v3) necessitate GPU acceleration for practical performance.
  • Experimental Features: Realtime WebSocket backend is marked as experimental.
  • Potential Conflicts: May require configuration adjustments when using keyboard remapping daemons (e.g., keyd, kmonad) or Bluetooth microphones. Persistent issues may require a full reinstall.
Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
14
Star History
93 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.