vocalinux by jatinkrmalik

Offline voice dictation for Linux

Created 1 year ago

636 stars

Top 51.3% on SourcePulse

Project Summary

This project provides a free, open-source, and fully offline voice dictation solution for Linux, addressing a gap in native voice typing capabilities. It enables users to dictate text into any application across X11 and Wayland environments, prioritizing privacy with its 100% offline architecture. The application is suitable for general users seeking enhanced accessibility and privacy, as well as developers interested in offline speech recognition on Linux.

How It Works

Vocalinux integrates multiple speech recognition engines, defaulting to the high-performance whisper.cpp for minimal latency and broad compatibility. It offers GPU acceleration via Vulkan, supporting AMD, Intel, and NVIDIA hardware. Text injection into applications is handled through ydotool, with a fallback to clipboard pasting for non-ASCII characters, ensuring seamless integration across diverse Linux desktop environments and windowing systems.

Quick Start & Requirements

The recommended installation method is an interactive script: bash curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh This script auto-detects hardware, recommends an engine, and installs dependencies.

Primary Install: Interactive script (see above) or install from source.
Prerequisites: Linux (Ubuntu 22.04+, Debian 11+, Fedora 39+, Arch, openSUSE), Python 3.9+, microphone. Vulkan-capable GPU recommended.
Setup Time: Approximately 1-2 minutes for the default installation.
Links: Installer script, vocalinux.com (implied), GitHub Releases page.

Highlighted Details

Engine Flexibility: Supports whisper.cpp (default, Vulkan GPU), OpenAI Whisper (PyTorch, NVIDIA only), and VOSK (lightweight).
Universal GPU Acceleration: Leverages Vulkan for performance gains on most modern GPUs.
X11/Wayland Support: Robust compatibility with both display server protocols, including specific fixes for IBus on Wayland.
Privacy-Focused: Operates entirely offline, ensuring no voice data leaves the user's machine.
User Experience: Features system tray integration, customizable keyboard shortcuts (toggle/push-to-talk), and a graphical settings dialog.
Recent Improvements: Enhanced non-ASCII text handling, improved Wayland IBus detection, and added dependencies for Pop!_OS/Ubuntu 24.04+.

Maintenance & Community

The project is actively maintained, with recent beta releases focusing on compatibility and code quality. It is part of a broader "Voca Ecosystem" including macOS and planned Windows applications. Community contributions are welcomed via bug reports, feature requests, and code. Links to discussions and contribution guidelines are available.

Licensing & Compatibility

This project is licensed under the GNU General Public License v3.0 (GPLv3). As a copyleft license, GPLv3 requires derivative works to be distributed under the same license, which may impose restrictions on integration into closed-source commercial products.

Limitations & Caveats

The project is currently in beta (v0.10.2-beta). Nightly builds are available but may be unstable. Users on minimal or custom window manager setups might need to configure autostart manually. The OpenAI Whisper engine requires an NVIDIA GPU.

vocalinux by jatinkrmalik

Explore Similar Projects

whispering by braden-w

VoiceFlow by infiniV

voxd by jakovius

claude-stt by jarrodwatts

pindrop by watzon

voxt by hehehai

ollama-voice-mac by apeatling

tambourine-voice by kstonekuan

voxtype by peteonrails

Scriberr by rishikanthc

FluidVoice by altic-dev

sherpa-onnx by k2-fsa