Asterisk-AI-Voice-Agent by hkjarral

AI voice agent for Asterisk/FreePBX telephony

Created 5 months ago

778 stars

Top 45.0% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

This project provides an open-source AI voice agent designed to integrate seamlessly with Asterisk and FreePBX telephony systems. It offers a flexible, modular pipeline architecture, allowing users to mix and match Speech-to-Text (STT), Large Language Model (LLM), and Text-to-Speech (TTS) providers, catering to diverse needs from enterprise deployments to privacy-focused local setups. The agent enables advanced AI-driven telephony actions, enhancing communication systems with intelligent automation.

How It Works

The core of the system is a modular pipeline architecture that decouples STT, LLM, and TTS components, enabling flexible provider selection (cloud, local, or hybrid). It integrates with Asterisk via the Asterisk Runtime Interface (ARI) and supports both AudioSocket and RTP (ExternalMedia) transports. A two-container architecture separates the ai-engine orchestrator from an optional local-ai-server for on-premises model execution. This design prioritizes flexibility, performance, and the ability to implement advanced features like tool calling and real-time barge-in.

Quick Start & Requirements

Installation is primarily Docker-based. The recommended quick start involves cloning the repository, running sudo ./preflight.sh --apply-fixes to set up the environment, and then starting the Admin UI with docker compose up -d --build admin-ui. Access the UI at http://localhost:3003 (default login: admin/admin). Key requirements include Docker and Docker Compose v2, Asterisk 18+ with ARI enabled, and a Linux operating system (Ubuntu 20.04+, Debian 11+, RHEL/Rocky/Alma 8+, Fedora 38+, Sangoma Linux). The system is x86_64 (AMD64) architecture only and does not support ARM64. Minimum system requirements vary: Cloud setups need 2+ cores/4GB RAM, while Local Hybrid requires 4+ cores/8GB+ RAM. Official documentation links are available for detailed guides.

Highlighted Details

Modular Pipeline: Mix and match STT, LLM, and TTS providers (e.g., OpenAI, Deepgram, Google Live API, ElevenLabs, Ollama, Vosk, Piper).
Production-Ready Baselines: Five pre-configured "golden baselines" for enterprise deployment, including a privacy-focused "Local Hybrid" option.
AI-Powered Actions: Supports tool calling for telephony actions like call transfers, voicemail, email summaries, and transcript requests.
Admin UI v1.0: A modern web interface for configuration, system management, real-time metrics, and per-call debugging via Call History.
Local LLM Support: Integration with Ollama allows for fully self-hosted LLM processing, enhancing privacy and reducing costs.
Dual Transport Support: Compatible with both AudioSocket and ExternalMedia RTP.

Maintenance & Community

The project maintains an active community through a Discord server and utilizes GitHub Issues for bug reports and GitHub Discussions for general chat. The frequent release cadence (e.g., v4.6.0) indicates ongoing development and support.

Licensing & Compatibility

The project is licensed under the permissive MIT License, allowing for broad use, modification, and distribution, including in commercial and closed-source applications without significant restrictions.

Limitations & Caveats

The primary limitation is the strict requirement for x86_64 (AMD64) architecture only; ARM64 platforms (like Apple Silicon Macs or Raspberry Pi) are not supported. The system also requires a Linux operating system with systemd.

Health Check

Last Commit

20 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

77 stars in the last 30 days