Voice chat for low-latency AI companion interaction
Top 90.0% on sourcepulse
This project provides a low-latency AI voice chat companion, enabling real-time spoken interaction with AI models. It is designed for users seeking a responsive and natural voice-based AI experience, leveraging advanced speech recognition and synthesis technologies.
How It Works
The system utilizes faster_whisper
for efficient speech-to-text conversion and ElevenLabs' streaming API for text-to-speech synthesis. This combination allows for near real-time processing of spoken input and generation of AI responses, minimizing latency for a more fluid conversation. Two modes are offered: voice_talk_vad.py
for automatic speech detection and voice_talk.py
for manual recording control via the spacebar.
Quick Start & Requirements
pip install openai elevenlabs pyaudio wave keyboard faster_whisper numpy torch
python voice_talk_vad.py
or python voice_talk.py
.Highlighted Details
faster_whisper
and ElevenLabs input streaming.openai
, elevenlabs
, pyaudio
, keyboard
, and faster_whisper
.Maintenance & Community
The project acknowledges contributions from developers of faster_whisper
, ElevenLabs, and OpenAI. Contributions are welcomed via pull requests, with issues encouraged for significant changes.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Performance is dependent on internet connection speed, with the demo conducted on a 10Mbit/s connection. The project does not specify compatibility with different operating systems or hardware configurations beyond standard Python environments.
1 month ago
1 day