Speech assistant using Twilio Voice and OpenAI Realtime API
Top 91.7% on sourcepulse
This project demonstrates a real-time AI voice assistant for phone calls using Twilio Voice Media Streams and OpenAI's Realtime API. It targets developers building interactive voice applications who want to integrate advanced conversational AI. The primary benefit is enabling natural, two-way voice conversations between callers and an AI assistant.
How It Works
The application establishes simultaneous WebSocket connections with Twilio's Media Streams and OpenAI's Realtime API. Audio captured from phone calls via Twilio is streamed to OpenAI for speech-to-text processing and AI response generation. The AI's synthesized speech is then sent back through Twilio to the caller, creating a seamless, real-time conversational flow. This approach minimizes latency by avoiding intermediate storage or batch processing.
Quick Start & Requirements
pip install -r requirements.txt
python main.py
.env
file.Highlighted Details
input_audio_buffer.speech_started
and conversation.item.truncate
.Maintenance & Community
No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project focuses on inbound calls; outbound calling is not directly supported. The use of ngrok is required for local development, implying potential complexities for production deployment without further configuration.
6 months ago
1 week