Discover and explore top open-source AI tools and projects—updated daily.
themanyoneVoice keyboard for local AI chat, image gen, webcam, & voice control
Top 95.3% on SourcePulse
This project provides a private, voice-controlled interface for interacting with a computer, integrating speech-to-text, AI chat, image generation, and system control. It targets users seeking a hands-free, AI-powered computing experience, akin to a "ship's computer," enabling tasks like dictation, web searches, and application launching via voice commands.
How It Works
The system leverages whisper.cpp for efficient, local speech-to-text and translation, minimizing external dependencies. Voice commands are parsed to trigger actions using pyautogui for system control and application launching. It can optionally integrate with local LLMs (like llama.cpp) or cloud services (OpenAI, Gemini) for AI chat and text-to-speech via mimic3 or piper, and local Stable Diffusion for image generation.
Quick Start & Requirements
ladspa-delay-so-delay-5s (via gstreamer1-plugins-bad-free-extras).pip install -r whisper_dictation/requirements.txt.whisper.cpp with CUDA support: GGML_CUDA=1 make -j.whisper.cpp server: ./whisper_cpp_server -l en -m models/ggml-tiny.en.bin --port 7777.Highlighted Details
torch, pycuda, cudnn, and ffmpeg.--medvram or --lowvram flags.llama.cpp and optional cloud APIs (OpenAI, Gemini).Maintenance & Community
mimic3 may be abandoned in favor of piper.Licensing & Compatibility
Limitations & Caveats
mimic3 is noted as potentially abandoned, with piper suggested as a replacement.1 month ago
1 day
cogentapps