Voice keyboard for local AI chat, image gen, webcam, & voice control
Top 99.0% on sourcepulse
This project provides a private, voice-controlled interface for interacting with a computer, integrating speech-to-text, AI chat, image generation, and system control. It targets users seeking a hands-free, AI-powered computing experience, akin to a "ship's computer," enabling tasks like dictation, web searches, and application launching via voice commands.
How It Works
The system leverages whisper.cpp
for efficient, local speech-to-text and translation, minimizing external dependencies. Voice commands are parsed to trigger actions using pyautogui
for system control and application launching. It can optionally integrate with local LLMs (like llama.cpp
) or cloud services (OpenAI, Gemini) for AI chat and text-to-speech via mimic3
or piper
, and local Stable Diffusion for image generation.
Quick Start & Requirements
ladspa-delay-so-delay-5s
(via gstreamer1-plugins-bad-free-extras
).pip install -r whisper_dictation/requirements.txt
.whisper.cpp
with CUDA support: GGML_CUDA=1 make -j
.whisper.cpp
server: ./whisper_cpp_server -l en -m models/ggml-tiny.en.bin --port 7777
.Highlighted Details
torch
, pycuda
, cudnn
, and ffmpeg
.--medvram
or --lowvram
flags.llama.cpp
and optional cloud APIs (OpenAI, Gemini).Maintenance & Community
mimic3
may be abandoned in favor of piper
.Licensing & Compatibility
Limitations & Caveats
mimic3
is noted as potentially abandoned, with piper
suggested as a replacement.1 month ago
1 day