CAAL  by CoreWorxLab

Local voice assistant with extensible tool capabilities

Created 2 months ago
337 stars

Top 82.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

CAAL is a local, extensible voice assistant designed for users seeking privacy and customization. It integrates seamlessly with Home Assistant and allows for dynamic capability expansion through auto-discovered n8n workflows, acting as a flexible platform for smart home control and automation.

How It Works

Built on LiveKit Agents, CAAL processes voice commands through a pipeline involving wake word detection (OpenWakeWord), speech-to-text (Speaches, Ollama, or Groq), large language model inference (Ollama or Groq), and text-to-speech (Kokoro or Piper). Its core innovation lies in exposing any n8n workflow as a tool via MCP (Message Communication Protocol), enabling users to easily add custom functionalities. Home Assistant integration is achieved through simplified MCP tools for device control and state querying.

Quick Start & Requirements

Installation involves cloning the repository, copying .env.example to .env, and configuring CAAL_HOST_IP. Deployment is primarily Docker-based:

  • GPU Mode (NVIDIA Linux): docker compose up -d. Requires Docker with NVIDIA Container Toolkit; 12GB+ VRAM recommended.
  • CPU-Only Mode: docker compose -f docker-compose.cpu.yaml up -d. Utilizes Groq for LLM/STT and Piper for TTS, requiring no GPU.
  • Apple Silicon: ./start-apple.sh leverages mlx-audio.
  • Distributed: Refer to docs/DISTRIBUTED-DEPLOYMENT.md. Post-setup, access the configuration wizard at http://YOUR_SERVER_IP:3000.

Highlighted Details

  • Flexible Providers: Supports local Ollama or cloud Groq for LLM/STT, and Kokoro or Piper for TTS.
  • Home Assistant Control: Native MCP integration with hass_control and hass_get_state tools for seamless smart home management.
  • n8n Workflow Expansion: Any n8n workflow can be dynamically integrated as a tool for the LLM.
  • Wake Word: "Hey Cal" activation via server-side OpenWakeWord.
  • Web Search: Integrated DuckDuckGo for real-time information retrieval.
  • Webhook API: Exposes REST endpoints (/announce, /wake, /reload-tools) on port 8889 for external integrations.
  • Mobile App: A Flutter-based Android and iOS client is available via GitHub Releases.

Maintenance & Community

The README does not detail specific contributors, sponsorships, or community channels like Discord/Slack. It lists several related projects, including LiveKit Agents, Speaches, Kokoro-FastAPI, Piper, mlx-audio, Ollama, Groq, n8n, and Home Assistant, suggesting an active ecosystem.

Licensing & Compatibility

The project is released under the permissive MIT License, allowing for commercial use and integration into closed-source applications. Note that HTTPS is required for voice access from non-localhost network devices.

Limitations & Caveats

GPU mode necessitates an NVIDIA GPU with the NVIDIA Container Toolkit and ideally 12GB+ VRAM. The CPU-only mode relies on Groq's cloud API for voice processing, making it not fully local. Initial setup may be slow due to model downloads. Ollama requires specific network binding configurations when run within Docker.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
13
Issues (30d)
11
Star History
91 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory).

AstrBot by AstrBotDevs

6.6%
18k
LLM chatbot/framework for multiple platforms
Created 3 years ago
Updated 17 hours ago
Feedback? Help us improve.