AIVoiceChat  by KoljaB

Voice chat for low-latency AI companion interaction

Created 2 years ago
307 stars

Top 87.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a low-latency AI voice chat companion, enabling real-time spoken interaction with AI models. It is designed for users seeking a responsive and natural voice-based AI experience, leveraging advanced speech recognition and synthesis technologies.

How It Works

The system utilizes faster_whisper for efficient speech-to-text conversion and ElevenLabs' streaming API for text-to-speech synthesis. This combination allows for near real-time processing of spoken input and generation of AI responses, minimizing latency for a more fluid conversation. Two modes are offered: voice_talk_vad.py for automatic speech detection and voice_talk.py for manual recording control via the spacebar.

Quick Start & Requirements

  • Install dependencies: pip install openai elevenlabs pyaudio wave keyboard faster_whisper numpy torch
  • Requires OpenAI and ElevenLabs API keys.
  • Run with python voice_talk_vad.py or python voice_talk.py.
  • Demo available at: [Link to Demo Video]

Highlighted Details

  • Achieves low latency through faster_whisper and ElevenLabs input streaming.
  • Offers two distinct interaction modes: automatic speech detection or manual spacebar control.
  • Built with core libraries including openai, elevenlabs, pyaudio, keyboard, and faster_whisper.

Maintenance & Community

The project acknowledges contributions from developers of faster_whisper, ElevenLabs, and OpenAI. Contributions are welcomed via pull requests, with issues encouraged for significant changes.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Performance is dependent on internet connection speed, with the demo conducted on a 10Mbit/s connection. The project does not specify compatibility with different operating systems or hardware configurations beyond standard Python environments.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.4%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 3 months ago
Feedback? Help us improve.