macos-local-voice-agents by kwindla

Local voice agents for macOS

Created 5 months ago

299 stars

Top 89.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Luis Capelo

Cofounder of Lightning AI

Project Summary

This repository provides a framework for building real-time voice AI applications on macOS, specifically demonstrating a voice agent that runs entirely locally using the Pipecat framework. It's designed for developers and researchers interested in low-latency, on-device voice AI, offering a potential voice-to-voice latency under 800ms on M-series Macs with strong models.

How It Works

The system utilizes a pipeline of local models for various voice processing tasks, including Silero VAD for voice activity detection, MLX Whisper for speech-to-text, Gemma3n 4B for language understanding, and Kokoro TTS for text-to-speech. Communication between the agent and client is handled via a low-latency, serverless WebRTC connection, optimized for real-time audio. The architecture is modular, allowing for easy swapping of models and customization of the pipeline, including tool calling and parallel processing.

Quick Start & Requirements

Installation: Clone the repository, navigate to server/, and install dependencies using uv run bot.py or python3.12 -m venv venv && source venv/bin/activate && pip install -r requirements.txt && python bot.py.
Prerequisites: Requires macOS, Python 3.12, and an OpenAI-compatible LLM server (LM Studio is recommended). Initial model downloads may take over 30 seconds.
Web Client: Navigate to client/, run npm i, and then npm run dev. Access the client via the URL provided in the terminal.
Docs: Voice AI & Voice Agents Illustrated Guide.

Highlighted Details

Achieves sub-800ms voice-to-voice latency on M-series Macs.
Employs serverless WebRTC for low-latency, real-time audio communication.
Supports local, OpenAI-compatible LLM servers like LM Studio.

Maintenance & Community

The project is maintained by kwindla. Further community and contribution details are not specified in the README.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the provided README content. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The initial startup time for model loading can exceed 30 seconds. While the README suggests setting HF_HUB_OFFLINE=1 for faster subsequent startups, the core dependencies and licensing for commercial use require further clarification.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days