vocalinux  by jatinkrmalik

Offline voice dictation for Linux

Created 1 year ago
282 stars

Top 92.4% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a free, open-source, and fully offline voice dictation solution for Linux, addressing a gap in native voice typing capabilities. It enables users to dictate text into any application across X11 and Wayland environments, prioritizing privacy with its 100% offline architecture. The application is suitable for general users seeking enhanced accessibility and privacy, as well as developers interested in offline speech recognition on Linux.

How It Works

Vocalinux integrates multiple speech recognition engines, defaulting to the high-performance whisper.cpp for minimal latency and broad compatibility. It offers GPU acceleration via Vulkan, supporting AMD, Intel, and NVIDIA hardware. Text injection into applications is handled through ydotool, with a fallback to clipboard pasting for non-ASCII characters, ensuring seamless integration across diverse Linux desktop environments and windowing systems.

Quick Start & Requirements

The recommended installation method is an interactive script: bash curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh This script auto-detects hardware, recommends an engine, and installs dependencies.

  • Primary Install: Interactive script (see above) or install from source.
  • Prerequisites: Linux (Ubuntu 22.04+, Debian 11+, Fedora 39+, Arch, openSUSE), Python 3.9+, microphone. Vulkan-capable GPU recommended.
  • Setup Time: Approximately 1-2 minutes for the default installation.
  • Links: Installer script, vocalinux.com (implied), GitHub Releases page.

Highlighted Details

  • Engine Flexibility: Supports whisper.cpp (default, Vulkan GPU), OpenAI Whisper (PyTorch, NVIDIA only), and VOSK (lightweight).
  • Universal GPU Acceleration: Leverages Vulkan for performance gains on most modern GPUs.
  • X11/Wayland Support: Robust compatibility with both display server protocols, including specific fixes for IBus on Wayland.
  • Privacy-Focused: Operates entirely offline, ensuring no voice data leaves the user's machine.
  • User Experience: Features system tray integration, customizable keyboard shortcuts (toggle/push-to-talk), and a graphical settings dialog.
  • Recent Improvements: Enhanced non-ASCII text handling, improved Wayland IBus detection, and added dependencies for Pop!_OS/Ubuntu 24.04+.

Maintenance & Community

The project is actively maintained, with recent beta releases focusing on compatibility and code quality. It is part of a broader "Voca Ecosystem" including macOS and planned Windows applications. Community contributions are welcomed via bug reports, feature requests, and code. Links to discussions and contribution guidelines are available.

Licensing & Compatibility

This project is licensed under the GNU General Public License v3.0 (GPLv3). As a copyleft license, GPLv3 requires derivative works to be distributed under the same license, which may impose restrictions on integration into closed-source commercial products.

Limitations & Caveats

The project is currently in beta (v0.10.2-beta). Nightly builds are available but may be unstable. Users on minimal or custom window manager setups might need to configure autostart manually. The OpenAI Whisper engine requires an NVIDIA GPU.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
26
Issues (30d)
8
Star History
64 stars in the last 30 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

RealtimeSTT by KoljaB

0.2%
10k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 1 month ago
Feedback? Help us improve.