voice-chat-ai  by bigsk1

Voice chat app for interacting with AI characters using speech

created 1 year ago
294 stars

Top 90.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a web UI and CLI for real-time voice chat with AI characters, supporting local execution via Ollama, OpenAI, Anthropic, or xAI for chat, and XTTS, OpenAI, ElevenLabs, or Kokoro for speech. It targets users seeking interactive AI conversations, role-playing, or AI-assisted games and stories, offering flexibility in model and voice provider choices.

How It Works

The application leverages a modular architecture allowing users to mix and match LLM providers (OpenAI, Anthropic, xAI, Ollama) with various TTS and STT services. It supports OpenAI's WebRTC for real-time, interruptible conversations and OpenAI's enhanced TTS models for expressive speech. Local models like XTTS and Faster Whisper are also integrated, with options for GPU acceleration via CUDA. Sentiment analysis is used to adapt AI responses based on user mood.

Quick Start & Requirements

  • Install: Clone the repository, create a Python 3.10 virtual environment, and install dependencies (pip install -r requirements.txt or requirements_cpu.txt).
  • Prerequisites: Python 3.10, ffmpeg, a chat provider (Ollama, OpenAI, xAI, Anthropic), a speech provider (XTTS, OpenAI API, ElevenLabs API, Kokoro TTS), Microsoft C++ Build Tools (Windows for XTTS), and a microphone. CUDA and cuDNN are recommended for GPU acceleration.
  • Docker: Pre-built images are available for CPU and GPU.
  • Docs: Games Documentation, Stories Documentation

Highlighted Details

  • Supports OpenAI Realtime API for zero turn-taking and instant responses.
  • Features 15+ interactive game types and immersive story adventures.
  • Allows adding custom characters with specific prompts and mood-based responses.
  • Offers OpenAI Enhanced Mode with expressive TTS models like gpt-4o-mini-tts.
  • Integrates local transcription via Faster Whisper.

Maintenance & Community

The project is actively maintained by bigsk1. Community support channels are not explicitly mentioned in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Local XTTS and Faster Whisper performance is significantly slower on CPU. CUDA setup for Docker requires specific NVIDIA toolkit and cuDNN installations. The README notes that sample character .wav files are of lower quality and can be replaced.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
1
Star History
77 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.9%
8k
Speech-to-text library for realtime applications
created 1 year ago
updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Andre Zayarni Andre Zayarni(Cofounder of Qdrant), and
2 more.

RealChar by Shaunwei

0.1%
6k
Real-time AI character/companion creation and interaction codebase
created 2 years ago
updated 1 year ago
Feedback? Help us improve.