hyprwhspr  by goodroot

Native speech-to-text for system-wide dictation

Created 4 months ago
697 stars

Top 49.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Native speech-to-text for Arch Linux, Hyprwhspr offers fast, accurate, and private system-wide dictation. It targets users of Arch-based distributions seeking seamless voice-to-text integration, prioritizing local processing for enhanced privacy and instant performance.

How It Works

This project leverages the Arch User Repository (AUR) for straightforward installation. It defaults to local, in-memory Whisper models (via pywhispercpp or Parakeet-v3) for instant, private transcription. For enhanced flexibility, it supports cloud-based APIs (OpenAI, Groq) and custom REST endpoints. Key architectural choices include a highly customizable hotkey system, an optional themed visualizer, and advanced text replacement features for punctuation and commands.

Quick Start & Requirements

  • Install: Use an AUR helper: yay -S hyprwhspr (stable) or yay -S hyprwhspr-git (bleeding edge).
  • Prerequisites: Arch Linux-based system. Optional GPU acceleration: NVIDIA (CUDA) or AMD/Intel (Vulkan).
  • Setup: Run hyprwhspr setup auto for defaults or hyprwhspr setup for interactive configuration.
  • First Use: Log out/in, ensure microphone is active, then use the default Super+Alt+D hotkey to toggle dictation.
  • Docs: Comprehensive README serves as primary documentation.

Highlighted Details

  • Local-First Dictation: Employs local Whisper models for privacy and speed, with optional cloud integration.
  • Extensive Customization: Supports custom hotkeys, word overrides, automatic punctuation/symbol conversion, configurable paste behavior, and auto-submit options.
  • Themed Visualizer: Provides real-time voice visualization, designed to match system themes like Omarchy.
  • Flexible Backends: Integrates pywhispercpp, Parakeet-v3, OpenAI/Groq REST APIs, and experimental Realtime WebSocket streaming.

Maintenance & Community

The project is distributed via the AUR, indicating community support and maintenance. Users are encouraged to report issues via GitHub issues. No explicit community chat links (Discord/Slack) are provided.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

  • Arch-Centric: Primarily designed and optimized for Arch Linux and derivatives.
  • GPU Dependency: Larger Whisper models (large, large-v3) necessitate GPU acceleration for practical performance.
  • Experimental Features: Realtime WebSocket backend is marked as experimental.
  • Potential Conflicts: May require configuration adjustments when using keyboard remapping daemons (e.g., keyd, kmonad) or Bluetooth microphones. Persistent issues may require a full reinstall.
Health Check
Last Commit

22 hours ago

Responsiveness

Inactive

Pull Requests (30d)
18
Issues (30d)
19
Star History
590 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.2%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 6 months ago
Feedback? Help us improve.