whisper_dictation by themanyone

Voice keyboard for local AI chat, image gen, webcam, & voice control

Created 2 years ago

285 stars

Top 91.9% on SourcePulse

Project Summary

This project provides a private, voice-controlled interface for interacting with a computer, integrating speech-to-text, AI chat, image generation, and system control. It targets users seeking a hands-free, AI-powered computing experience, akin to a "ship's computer," enabling tasks like dictation, web searches, and application launching via voice commands.

How It Works

The system leverages whisper.cpp for efficient, local speech-to-text and translation, minimizing external dependencies. Voice commands are parsed to trigger actions using pyautogui for system control and application launching. It can optionally integrate with local LLMs (like llama.cpp) or cloud services (OpenAI, Gemini) for AI chat and text-to-speech via mimic3 or piper, and local Stable Diffusion for image generation.

Quick Start & Requirements

Install GStreamer and ladspa-delay-so-delay-5s (via gstreamer1-plugins-bad-free-extras).
Install Python dependencies: pip install -r whisper_dictation/requirements.txt.
Build whisper.cpp with CUDA support: GGML_CUDA=1 make -j.
Run whisper.cpp server: ./whisper_cpp_server -l en -m models/ggml-tiny.en.bin --port 7777.
Requires >= 4 GiB VRAM for full functionality, especially with LLMs and image generation.

Highlighted Details

Reduced dependencies by eliminating torch, pycuda, cudnn, and ffmpeg.
Stable Diffusion can run with as little as 2 GiB VRAM using --medvram or --lowvram flags.
Supports local LLMs via llama.cpp and optional cloud APIs (OpenAI, Gemini).
Enables voice-controlled webcam, audio recording, and application launching.

Maintenance & Community

Developed by Henry Kroll III (themanyone).
Links to GitHub, YouTube, Mastodon, LinkedIn, and a "Buy Me a Coffee" page are provided.
Mentions mimic3 may be abandoned in favor of piper.

Licensing & Compatibility

Licensed under MIT.
Permissive license allows for individual modification and use, suitable for commercial applications.

Limitations & Caveats

mimic3 is noted as potentially abandoned, with piper suggested as a replacement.
High VRAM usage can occur with large models and context windows, potentially leading to crashes.
Performance may vary based on hardware, especially for LLM inference and image generation.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

1

Issues (30d)

1

Star History

7 stars in the last 30 days

Explore Similar Projects

whisperIME by woheller69

Android IME based on Whisper

Created 1 year ago

Updated 4 months ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

S.A.T.U.R.D.A.Y by GRVYDEV

Vocal computing toolbox for building voice interfaces to LLMs

Created 2 years ago

Updated 2 years ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

pi-card by nkasmanoff

Voice assistant for Raspberry Pi

Created 1 year ago

Updated 1 year ago

gpt-voice-conversation-chatbot by Adri6336

Voice chatbot for engaging spoken conversations with ChatGPT/GPT-4

Created 2 years ago

Updated 1 year ago

ChatWaifuL2D by cjyaddone

ChatGPT voice chat program with TTS

Created 3 years ago

Updated 1 year ago

voicechat2 by lhl

Local AI voicechat using WebSockets

Created 1 year ago

Updated 1 year ago

unity-AI-Chat-Toolkit by zhangliwei7758

Unity toolkit for AI chat functionality

Created 2 years ago

Updated 6 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

ChatdollKit by uezo

3D virtual assistant SDK for voice-enabled chatbots using 3D models

Created 5 years ago

Updated 1 month ago

Starred by

Emile Vauge

Emile Vauge(Founder of Traefik).

Scriberr by rishikanthc

Self-hosted app for local AI audio transcription

Created 1 year ago

Updated 4 days ago

xiaozhi-android-client by TOM88812

Cross-platform Flutter app for AI voice/text chat

Created 10 months ago

Updated 1 week ago

Starred by

Victor Taelin

Victor Taelin(Author of Bend, Kind, HVM) and

Eric Zhu

Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research).

chat-with-gpt by cogentapps

Open-source ChatGPT app with voice

Created 2 years ago

Updated 1 year ago

Starred by

Tim J. Baek

Tim J. Baek(Founder of Open WebUI).

WhisperLiveKit by QuentinFuxa

Python package for real-time, local speech-to-text

Created 1 year ago

Updated 2 days ago

Feedback? Help us improve.