macos-local-voice-agents  by kwindla

Local voice agents for macOS

Created 1 month ago
275 stars

Top 94.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a framework for building real-time voice AI applications on macOS, specifically demonstrating a voice agent that runs entirely locally using the Pipecat framework. It's designed for developers and researchers interested in low-latency, on-device voice AI, offering a potential voice-to-voice latency under 800ms on M-series Macs with strong models.

How It Works

The system utilizes a pipeline of local models for various voice processing tasks, including Silero VAD for voice activity detection, MLX Whisper for speech-to-text, Gemma3n 4B for language understanding, and Kokoro TTS for text-to-speech. Communication between the agent and client is handled via a low-latency, serverless WebRTC connection, optimized for real-time audio. The architecture is modular, allowing for easy swapping of models and customization of the pipeline, including tool calling and parallel processing.

Quick Start & Requirements

  • Installation: Clone the repository, navigate to server/, and install dependencies using uv run bot.py or python3.12 -m venv venv && source venv/bin/activate && pip install -r requirements.txt && python bot.py.
  • Prerequisites: Requires macOS, Python 3.12, and an OpenAI-compatible LLM server (LM Studio is recommended). Initial model downloads may take over 30 seconds.
  • Web Client: Navigate to client/, run npm i, and then npm run dev. Access the client via the URL provided in the terminal.
  • Docs: Voice AI & Voice Agents Illustrated Guide.

Highlighted Details

  • Achieves sub-800ms voice-to-voice latency on M-series Macs.
  • Employs serverless WebRTC for low-latency, real-time audio communication.
  • Supports local, OpenAI-compatible LLM servers like LM Studio.

Maintenance & Community

The project is maintained by kwindla. Further community and contribution details are not specified in the README.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the provided README content. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The initial startup time for model loading can exceed 30 seconds. While the README suggests setting HF_HUB_OFFLINE=1 for faster subsequent startups, the core dependencies and licensing for commercial use require further clarification.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
109 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.