voicechat2 by lhl

Local AI voicechat using WebSockets

Created 1 year ago

748 stars

Top 46.5% on SourcePulse

Project Summary

This project provides a fast, fully local AI voice chat system using WebSockets, targeting users who want to build or experiment with real-time, conversational AI agents. It offers modularity for Speech-to-Text (SRT), Large Language Model (LLM), and Text-to-Speech (TTS) components, enabling customizable voice chat experiences with low latency.

How It Works

The system employs a modular architecture, allowing users to swap out SRT, LLM, and TTS backends. It utilizes WebSockets for communication, enabling simple remote access. Key components include Voice Activity Detection (VAD) via ricky0123/vad and Opus audio support via symblai/opus-encdec. The modularity allows integration with various popular libraries like whisper.cpp, faster-whisper, llama.cpp, Coqui TTS, StyleTTS2, and Piper.

Quick Start & Requirements

Install: Clone the repository, create a Python 3.11 environment (recommended: mamba), install requirements (pip install -r requirements.txt), and build llama.cpp with ROCm (GGML_HIPBLAS=1) or CUDA (GGML_CUDA=1) support.
Prerequisites: Ubuntu LTS, ROCm or CUDA setup, Python 3.11, byobu, curl, wget, espeak-ng, ffmpeg, libopus0, libopus-dev.
Models: Requires downloading GGUF models for LLMs (e.g., Meta-Llama-3-8B-Instruct-Q4_K_M.gguf).
Resources: Latency can be around 1 second on AMD RDNA3 (7900-class) and as low as 300ms on an RTX 4090 with specific models.
Links: voicechat2 demo video (unmute to hear audio), Hackster.io writeup

Highlighted Details

Achieves sub-second latency for voice-to-voice conversations on high-end GPUs.
Supports multiple SRT (whisper.cpp, faster-whisper, HF Transformers), LLM (llama.cpp, OpenAI API), and TTS (Coqui TTS, StyleTTS2, Piper, MeloTTS) backends.
Includes convenience scripts (run-voicechat2.sh, remote-tunnel.sh, local-tunnel.sh) for deployment and remote access.

Maintenance & Community

The project is maintained by lhl. No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license for the voicechat2 repository itself. However, it references other projects with MIT and Apache 2.0 licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Installation instructions are specific to Ubuntu LTS and assume prior ROCm/CUDA setup. The project does not specify a license, which may impact commercial adoption. Some referenced related projects have unclear or missing licenses.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days