local-voice-ai  by ShayneP

Local AI voice assistant with real-time speech and text capabilities

Created 8 months ago
359 stars

Top 78.1% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a fully containerized, local AI voice assistant, enabling real-time speech-to-text, large language model interaction, and text-to-speech synthesis. It targets developers and power users seeking a self-hosted, customizable voice AI solution, offering the benefit of local data processing and control over the AI pipeline.

How It Works

The system orchestrates multiple services using Docker Compose: LiveKit for WebRTC signaling, a Python agent integrating LiveKit SDK, Whisper via VoxBox for speech-to-text, llama.cpp for running local LLMs, and Kokoro for text-to-speech. The agent handles the pipeline, routing STT to Whisper, LLM requests to llama.cpp, and TTS to Kokoro. It also incorporates Retrieval-Augmented Generation (RAG) by embedding documents using Sentence Transformers and indexing them with FAISS for efficient knowledge retrieval during conversations.

Quick Start & Requirements

  • Primary install/run command: Execute the ./test.sh script, which cleans, builds, and launches the full stack. Access the UI at http://localhost:3000.
  • Prerequisites: Docker and Docker Compose are required. No GPU is necessary as CPU-based models are utilized.
  • Recommended RAM: 12GB or more.
  • Relevant pages: http://localhost:3000 (local UI access).

Highlighted Details

  • Full-stack, Dockerized architecture for easy deployment.
  • Real-time voice processing pipeline: STT (Whisper), LLM (llama.cpp), TTS (Kokoro).
  • Integrated Retrieval-Augmented Generation (RAG) for enhanced knowledge retrieval.
  • Local execution of LLMs via llama.cpp, avoiding external API dependencies.

Maintenance & Community

The project leverages components from LiveKit, llama.cpp, and Kokoro. Specific community channels or detailed contributor information beyond the core technologies are not detailed in the README.

Licensing & Compatibility

The README does not specify a software license. This omission requires further investigation for compatibility, especially for commercial use or integration into closed-source projects.

Limitations & Caveats

The system relies on CPU-based models, which may impact performance and response times for complex tasks. A minimum of 12GB RAM is recommended, indicating a significant resource footprint. The lack of explicit licensing information presents a potential adoption blocker.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
2
Star History
230 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

Orpheus-TTS by canopyai

0.2%
6k
Open-source TTS for human-sounding speech, built on Llama-3b
Created 10 months ago
Updated 1 month ago
Feedback? Help us improve.