voice-chat-ai by bigsk1

Voice chat app for interacting with AI characters using speech

Created 1 year ago

367 stars

Top 76.9% on SourcePulse

Project Summary

This project provides a web UI and CLI for real-time voice chat with AI characters, supporting local execution via Ollama, OpenAI, Anthropic, or xAI for chat, and XTTS, OpenAI, ElevenLabs, or Kokoro for speech. It targets users seeking interactive AI conversations, role-playing, or AI-assisted games and stories, offering flexibility in model and voice provider choices.

How It Works

The application leverages a modular architecture allowing users to mix and match LLM providers (OpenAI, Anthropic, xAI, Ollama) with various TTS and STT services. It supports OpenAI's WebRTC for real-time, interruptible conversations and OpenAI's enhanced TTS models for expressive speech. Local models like XTTS and Faster Whisper are also integrated, with options for GPU acceleration via CUDA. Sentiment analysis is used to adapt AI responses based on user mood.

Quick Start & Requirements

Install: Clone the repository, create a Python 3.10 virtual environment, and install dependencies (pip install -r requirements.txt or requirements_cpu.txt).
Prerequisites: Python 3.10, ffmpeg, a chat provider (Ollama, OpenAI, xAI, Anthropic), a speech provider (XTTS, OpenAI API, ElevenLabs API, Kokoro TTS), Microsoft C++ Build Tools (Windows for XTTS), and a microphone. CUDA and cuDNN are recommended for GPU acceleration.
Docker: Pre-built images are available for CPU and GPU.
Docs: Games Documentation, Stories Documentation

Highlighted Details

Supports OpenAI Realtime API for zero turn-taking and instant responses.
Features 15+ interactive game types and immersive story adventures.
Allows adding custom characters with specific prompts and mood-based responses.
Offers OpenAI Enhanced Mode with expressive TTS models like gpt-4o-mini-tts.
Integrates local transcription via Faster Whisper.

Maintenance & Community

The project is actively maintained by bigsk1. Community support channels are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Local XTTS and Faster Whisper performance is significantly slower on CPU. CUDA setup for Docker requires specific NVIDIA toolkit and cuDNN installations. The README notes that sample character .wav files are of lower quality and can be replaced.

Health Check

Last Commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days