call-gpt  by twilio-labs

Generative AI phone call toolkit using Twilio Media Streams

created 1 year ago
459 stars

Top 66.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a toolkit for building generative AI-powered phone call applications, leveraging Twilio Media Streams for real-time audio processing. It's designed for developers and researchers looking to create advanced voice applications that go beyond traditional IVR systems, offering features like low-latency responses, interruptibility, chat history, and tool integration.

How It Works

The application utilizes Twilio Media Streams to establish a WebSocket connection for bidirectional audio flow during phone calls. It integrates Deepgram for speech-to-text (STT) and text-to-speech (TTS) and OpenAI's GPT models for natural language understanding and response generation. This streaming architecture enables low-latency interactions and allows users to interrupt the AI, making conversations more dynamic and human-like.

Quick Start & Requirements

  • Install: npm install
  • Run: npm run dev
  • Prerequisites:
    • Node.js
    • Deepgram API Key
    • OpenAI API Key
    • Twilio Account SID, Auth Token, and a Twilio phone number
    • Ngrok (for local development)
  • Setup: Requires signing up for Deepgram and OpenAI services, configuring environment variables (.env.example to .env), and setting up a Twilio webhook. Local development requires ngrok to expose your local server.
  • Links: Twilio Media Streams, Deepgram, OpenAI

Highlighted Details

  • Real-time streaming for low latency (typically 1 second).
  • Supports user interruption and maintains chat history.
  • Enables GPT to call external tools via function calling.
  • Customizable GPT system prompts for persona and behavior control.
  • Includes utility scripts for inbound and outbound call testing.

Maintenance & Community

This project is part of twilio-labs, indicating official or semi-official support from Twilio. Specific community channels or active maintainer information are not detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, as a twilio-labs project, it's likely intended for demonstration and experimentation, and users should verify licensing for production use. Compatibility with commercial, closed-source applications would depend on the licenses of the underlying services (Deepgram, OpenAI, Twilio).

Limitations & Caveats

The project relies on third-party APIs (Deepgram, OpenAI) which may incur costs and have their own rate limits or availability issues. The Eleven Labs TTS integration example notes potential rate limiting errors. Deployment to Fly.io is recommended for stable performance, suggesting local hosting might be less reliable due to network variability.

Health Check
Last commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
35 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.