voxd  by jakovius

Voice-typing and dictation software for Linux desktops

Created 1 year ago
258 stars

Top 98.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

VOXD is a user-friendly, open-source dictation software for Linux, enabling speech-to-text input in any application. It targets Linux users seeking efficient, hands-free typing, offering local, offline voice processing and optional AI-driven text post-processing. The primary benefit is seamless integration of voice input into the desktop environment without relying on cloud services for core transcription.

How It Works

VOXD uses Whisper.cpp for fast, local, offline ASR. It simulates keyboard input via ydotool, allowing transcribed text to appear directly in any focused application, including Wayland. Optional AI Post-Processing (AIPP) integrates with local (llama.cpp, Ollama) or cloud LLMs to refine transcripts into formats like poems or code. Multiple interfaces (CLI, GUI, Tray, beta VAD) cater to diverse needs.

Quick Start & Requirements

  • Installation: Recommended via distro-specific packages (.deb, .rpm, .pkg.tar.zst) from GitHub Releases. Alternatively, clone the repo and run ./setup.sh (requires sudo and a system reboot for ydotool on Wayland). pipx installation is also supported.
  • Prerequisites: Linux distributions. ydotool is essential for Wayland simulated typing and requires specific setup/reboot. No GPU needed; runs on older CPUs. Optional: llama.cpp, Ollama, or cloud API keys for AIPP.
  • Setup: Install package/run script, then configure a global hotkey. Reboot often required for ydotool on Wayland.
  • Links: GitHub Releases.

Highlighted Details

  • Local ASR: Whisper.cpp backend for robust, offline speech-to-text.
  • Simulated Typing: Instantly types output into any active input field, supporting X11/Wayland via ydotool.
  • AI Post-Processing (AIPP): Rewrites transcripts using local/cloud LLMs (code, poetry).
  • Multi-Interface: CLI, minimal PyQt6 GUI, system tray, beta Voice Activity Detection (--flux).
  • Language Support: 99+ languages.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap were found in the provided README.

Licensing & Compatibility

VOXD is MIT licensed. The ydotool dependency (simulated typing) is AGPLv3. MIT permits commercial use and closed-source integration. AGPLv3 may impose obligations if ydotool's functionality is integral to a distributed application.

Limitations & Caveats

  • --flux (VAD) mode is beta.
  • Wayland users require correct ydotool setup and a system reboot.
  • Transcription accuracy can be affected by noisy environments or poor mic input; clipping is a potential issue.
  • Local AIPP requires downloading large GGUF models and potentially setting up llama.cpp or Ollama.
  • Uninstalling package installs removes system files but leaves user data; repo-clone installs use uninstall.sh.
Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.