voicemode  by mbailey

Natural voice conversations for AI assistants

Created 3 months ago
290 stars

Top 90.8% on SourcePulse

GitHubView on GitHub
Project Summary

Voice Mode enables natural, real-time voice conversations with AI coding assistants like Claude Code, Gemini CLI, and Cursor. It supports local microphone or LiveKit room-based communication and is compatible with any OpenAI-compatible Speech-to-Text (STT) and Text-to-Speech (TTS) services, including free, open-source local options like Whisper.cpp and Kokoro. This allows for low-latency, human-like voice interactions for programming, productivity, and more.

How It Works

Voice Mode acts as an MCP (Model Context Protocol) server, facilitating voice communication between the user and AI assistants. It handles speech detection, transcription, and synthesis, with features like silence detection for a more natural conversational flow. Its OpenAI API compatibility allows for flexible routing to various STT/TTS providers, enabling users to optimize for cost, latency, or privacy by switching between cloud services and local models seamlessly.

Quick Start & Requirements

  • Primary Install: Recommended: curl -O https://getvoicemode.com/install.sh && bash install.sh
  • Prerequisites: Python 3.10+, microphone/speakers or LiveKit server, optional OpenAI API Key. Xcode (macOS) for Core ML acceleration. System dependencies vary by OS (e.g., portaudio, ffmpeg).
  • Setup: Automatic installer handles dependencies and configuration. Local STT/TTS services can be installed separately.
  • Docs: voice-mode.readthedocs.io

Highlighted Details

  • Supports multiple AI coding assistants including Claude Code, Gemini CLI, Cursor, VS Code, and Zed.
  • Features local STT (Whisper.cpp) and TTS (Kokoro) for privacy and offline use.
  • OpenAI API compatibility allows for transparent routing and provider flexibility.
  • Automatic transport selection and silence detection for natural conversations.

Maintenance & Community

  • Community: Discord server available.
  • Social: Twitter/X (@getvoicemode), YouTube (@getvoicemode).

Licensing & Compatibility

  • License: MIT.
  • Compatibility: Compatible with commercial use and closed-source linking due to its MIT license and flexible architecture.

Limitations & Caveats

  • WSL2 users may require additional configuration for microphone access.
  • Performance of local STT/TTS services depends on hardware.
  • Some advanced features or integrations might be in preview or require specific configurations.
Health Check
Last Commit

22 hours ago

Responsiveness

Inactive

Pull Requests (30d)
16
Issues (30d)
4
Star History
81 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.