local-voice-ai by ShayneP

Local AI voice assistant with real-time speech and text capabilities

Created 10 months ago

438 stars

Top 68.1% on SourcePulse

Project Summary

This project provides a fully containerized, local AI voice assistant, enabling real-time speech-to-text, large language model interaction, and text-to-speech synthesis. It targets developers and power users seeking a self-hosted, customizable voice AI solution, offering the benefit of local data processing and control over the AI pipeline.

How It Works

The system orchestrates multiple services using Docker Compose: LiveKit for WebRTC signaling, a Python agent integrating LiveKit SDK, Whisper via VoxBox for speech-to-text, llama.cpp for running local LLMs, and Kokoro for text-to-speech. The agent handles the pipeline, routing STT to Whisper, LLM requests to llama.cpp, and TTS to Kokoro. It also incorporates Retrieval-Augmented Generation (RAG) by embedding documents using Sentence Transformers and indexing them with FAISS for efficient knowledge retrieval during conversations.

Quick Start & Requirements

Primary install/run command: Execute the ./test.sh script, which cleans, builds, and launches the full stack. Access the UI at http://localhost:3000.
Prerequisites: Docker and Docker Compose are required. No GPU is necessary as CPU-based models are utilized.
Recommended RAM: 12GB or more.
Relevant pages: http://localhost:3000 (local UI access).

Highlighted Details

Full-stack, Dockerized architecture for easy deployment.
Real-time voice processing pipeline: STT (Whisper), LLM (llama.cpp), TTS (Kokoro).
Integrated Retrieval-Augmented Generation (RAG) for enhanced knowledge retrieval.
Local execution of LLMs via llama.cpp, avoiding external API dependencies.

Maintenance & Community

The project leverages components from LiveKit, llama.cpp, and Kokoro. Specific community channels or detailed contributor information beyond the core technologies are not detailed in the README.

Licensing & Compatibility

The README does not specify a software license. This omission requires further investigation for compatibility, especially for commercial use or integration into closed-source projects.

Limitations & Caveats

The system relies on CPU-based models, which may impact performance and response times for complex tasks. A minimum of 12GB RAM is recommended, indicating a significant resource footprint. The lack of explicit licensing information presents a potential adoption blocker.

local-voice-ai by ShayneP

Explore Similar Projects

local_llm_assistant by nickbild

LLMVoX by mbzuai-oryx

S.A.T.U.R.D.A.Y by GRVYDEV

sage by farshed

SonicVale by xcLee001

fast-voice-assistant by dsa

jarvis by llm-guy

ichigo by janhq

local-talking-llm by vndee

unmute by kyutai-labs

voice-assistant by linyiLYi

Orpheus-TTS by canopyai