Local AI voicechat using WebSockets
Top 49.6% on sourcepulse
This project provides a fast, fully local AI voice chat system using WebSockets, targeting users who want to build or experiment with real-time, conversational AI agents. It offers modularity for Speech-to-Text (SRT), Large Language Model (LLM), and Text-to-Speech (TTS) components, enabling customizable voice chat experiences with low latency.
How It Works
The system employs a modular architecture, allowing users to swap out SRT, LLM, and TTS backends. It utilizes WebSockets for communication, enabling simple remote access. Key components include Voice Activity Detection (VAD) via ricky0123/vad
and Opus audio support via symblai/opus-encdec
. The modularity allows integration with various popular libraries like whisper.cpp
, faster-whisper
, llama.cpp
, Coqui TTS
, StyleTTS2
, and Piper
.
Quick Start & Requirements
pip install -r requirements.txt
), and build llama.cpp
with ROCm (GGML_HIPBLAS=1
) or CUDA (GGML_CUDA=1
) support.byobu
, curl
, wget
, espeak-ng
, ffmpeg
, libopus0
, libopus-dev
.Meta-Llama-3-8B-Instruct-Q4_K_M.gguf
).Highlighted Details
run-voicechat2.sh
, remote-tunnel.sh
, local-tunnel.sh
) for deployment and remote access.Maintenance & Community
The project is maintained by lhl. No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license for the voicechat2
repository itself. However, it references other projects with MIT and Apache 2.0 licenses. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Installation instructions are specific to Ubuntu LTS and assume prior ROCm/CUDA setup. The project does not specify a license, which may impact commercial adoption. Some referenced related projects have unclear or missing licenses.
9 months ago
1+ week