voicemode by mbailey

Natural voice conversations for AI assistants

Created 8 months ago

782 stars

Top 44.8% on SourcePulse

2 Experts Love This Project

jrk

Jonathan Ragan-Kelley

Professor at MIT

dguido

Cofounder of Trail of Bits

Project Summary

Voice Mode enables natural, real-time voice conversations with AI coding assistants like Claude Code, Gemini CLI, and Cursor. It supports local microphone or LiveKit room-based communication and is compatible with any OpenAI-compatible Speech-to-Text (STT) and Text-to-Speech (TTS) services, including free, open-source local options like Whisper.cpp and Kokoro. This allows for low-latency, human-like voice interactions for programming, productivity, and more.

How It Works

Voice Mode acts as an MCP (Model Context Protocol) server, facilitating voice communication between the user and AI assistants. It handles speech detection, transcription, and synthesis, with features like silence detection for a more natural conversational flow. Its OpenAI API compatibility allows for flexible routing to various STT/TTS providers, enabling users to optimize for cost, latency, or privacy by switching between cloud services and local models seamlessly.

Quick Start & Requirements

Primary Install: Recommended: curl -O https://getvoicemode.com/install.sh && bash install.sh
Prerequisites: Python 3.10+, microphone/speakers or LiveKit server, optional OpenAI API Key. Xcode (macOS) for Core ML acceleration. System dependencies vary by OS (e.g., portaudio, ffmpeg).
Setup: Automatic installer handles dependencies and configuration. Local STT/TTS services can be installed separately.
Docs: voice-mode.readthedocs.io

Highlighted Details

Supports multiple AI coding assistants including Claude Code, Gemini CLI, Cursor, VS Code, and Zed.
Features local STT (Whisper.cpp) and TTS (Kokoro) for privacy and offline use.
OpenAI API compatibility allows for transparent routing and provider flexibility.
Automatic transport selection and silence detection for natural conversations.

Maintenance & Community

Community: Discord server available.
Social: Twitter/X (@getvoicemode), YouTube (@getvoicemode).

Licensing & Compatibility

License: MIT.
Compatibility: Compatible with commercial use and closed-source linking due to its MIT license and flexible architecture.

Limitations & Caveats

WSL2 users may require additional configuration for microphone access.
Performance of local STT/TTS services depends on hardware.
Some advanced features or integrations might be in preview or require specific configurations.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

52

Issues (30d)

22

Star History

160 stars in the last 30 days

Explore Similar Projects

LingEcho-App by code-100-precent

An intelligent voice interaction platform for AI

Created 2 months ago

Updated 4 days ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

pi-card by nkasmanoff

Voice assistant for Raspberry Pi

Created 1 year ago

Updated 1 year ago

OpenVoiceChat by Finity-Alpha

Natural voice conversations with LLMs

Created 2 years ago

Updated 1 month ago

Vocalis by Lex-au

AI speech-to-speech assistant enabling natural, multimodal conversations

Created 11 months ago

Updated 10 months ago

voice-assistant-whisper-chatgpt by bhattbhavesh91

AI-powered voice assistant creation

Created 3 years ago

Updated 3 years ago

LocalAIVoiceChat by KoljaB

Local AI voice chat for real-time conversations

Created 2 years ago

Updated 8 months ago

whisplay-ai-chatbot by PiSugar

Pocket AI assistant like a futuristic walkie-talkie

Created 9 months ago

Updated 2 days ago

N.E.K.O by Project-N-E-K-O

Real-time AI companion for seamless, multi-scenario voice interaction

Created 8 months ago

Updated 17 hours ago

jarvis by llm-guy

Local voice-controlled AI assistant

Created 7 months ago

Updated 5 months ago

natively-cluely-ai-assistant by evinjohnn

Real-time, privacy-first AI assistant for live conversations

Created 4 weeks ago

Updated 1 day ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Long Ouyang

Long Ouyang(Research Scientist at OpenAI).

ElatoAI by akdeb

Realtime speech AI agents for ESP32 devices

Created 10 months ago

Updated 3 days ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Nir Gazit

Nir Gazit(Cofounder of Traceloop), and

4 more.

pipecat by pipecat-ai

Open-source framework for building real-time voice and multimodal conversational AI agents

Created 2 years ago

Updated 17 hours ago

Feedback? Help us improve.