Discover and explore top open-source AI tools and projects—updated daily.
NickTikhonovReal-time phone agent orchestration with sub-500ms latency
Top 52.7% on SourcePulse
This project provides a Python framework for building voice agent orchestrations with sub-500ms latency. It targets developers and researchers aiming to create highly responsive, real-time conversational AI experiences, offering a streamlined approach to integrating STT, LLM, and TTS pipelines.
How It Works
The framework employs two core abstractions: Deepgram Flux for continuous, low-latency Speech-to-Text (STT) and turn detection over a single WebSocket, and an Agent pipeline handling the LLM, Text-to-Speech (TTS), and audio playback. The entire conversational state machine is encapsulated in a pure function (process_event(state, event) -> (state, actions)) within approximately 30 lines of code. A key design principle is end-to-end streaming: LLM tokens immediately feed the TTS engine, and the resulting audio streams directly to the user via Twilio. This architecture enables instant barge-in, where user interruptions are detected and processed immediately, cancelling ongoing audio playback and clearing buffers.
Quick Start & Requirements
pip install -r requirements.txt.env.example to .env and populate with API keys. Run ngrok http 3040 in a separate terminal. Execute the main script with python main.py +1234567890.https://mature-spaniel-physically.ngrok-free.app) is shown during execution, indicating a live demo capability.Highlighted Details
Maintenance & Community
No specific details regarding contributors, sponsorships, community channels (like Discord/Slack), or roadmaps are provided in the README.
Licensing & Compatibility
The project is released under the MIT License, which is highly permissive and generally compatible with commercial use and closed-source applications.
Limitations & Caveats
Initial setup requires obtaining and configuring multiple third-party API keys (Twilio, Deepgram, OpenAI, ElevenLabs) and running a tunneling service (ngrok), which can present a barrier to entry for quick experimentation. The project is presented as a framework, suggesting it may require further development for specific application needs.
4 weeks ago
Inactive
vocodedev