realtime-phone-agents-course by neural-maze

Build realtime AI voice agents for scalable call centers

Created 3 months ago

945 stars

Top 38.6% on SourcePulse

1 Expert Loves This Project

azayarni

Cofounder of Qdrant

Project Summary

This course teaches how to build production-ready, real-time AI voice agent systems, simulating a call center for a real estate company. It targets Software, ML, and AI Engineers seeking to develop complex, end-to-end applications with low-latency communication and advanced data retrieval capabilities. The benefit lies in mastering the integration of cutting-edge tools for sophisticated voice agent deployment.

How It Works

The system integrates FastRTC for low-latency streaming conversations, Superlinked for sophisticated multi-attribute data search, and Twilio for managing live phone calls. Speech is transcribed using Moonshine and Fast Whisper, while voice generation employs Kokoro and Orpheus 3B. Scalable GPU deployment is facilitated by Runpod. This approach enables real-time, interactive voice agents capable of complex data querying and communication management.

Quick Start & Requirements

Primary commands include make start-gradio-application for a local demo and make start-call-center for a FastAPI-based call center setup. Exposing the local server requires make start-ngrok-tunnel.
Prerequisites include ffmpeg (for ffprobe issues) and a Twilio account. Detailed setup and dependency installation instructions are available in docs/GETTINGS_STARTED.md.
Links: docs/GETTINGS_STARTED.md, The Neural Maze YouTube channel.

Highlighted Details

Simulates a real estate company staffed by AI voice agents.
Full Twilio integration for inbound and outbound call handling.
Real-time conversational capabilities powered by FastRTC.
Advanced retrieval using Superlinked, enabling agents to handle complex, multi-attribute queries (e.g., property search by location and price).
Integrated STT/TTS pipelines using Moonshine, Fast Whisper, Kokoro, and Orpheus 3B.
Scalable deployment options using Runpod for GPU acceleration.

Maintenance & Community

Key contributors include Miguel Otero Pedrido and Jesús Copado from The Neural Maze.
Community engagement is fostered through The Neural Maze Newsletter and a YouTube channel featuring AI project deep dives.

Licensing & Compatibility

The project is licensed under the MIT License.
This license is permissive, generally allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

The course structure requires sequential learning through weekly lessons and associated articles/code.
Specific setup instructions are deferred to external documentation (docs/GETTINGS_STARTED.md).
Production deployment necessitates exposing a local server, typically via tunneling services like ngrok.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

1

Issues (30d)

0

Star History

100 stars in the last 30 days

Explore Similar Projects

onnx-asr by istupakov

Lightweight ONNX-based Automatic Speech Recognition (ASR)

Created 10 months ago

Updated 2 days ago

LingEcho-App by code-100-precent

An intelligent voice interaction platform for AI

Created 2 months ago

Updated 4 days ago

RealtimeSTT_LLM_TTS by Ikaros-521

Realtime STT/TTS pipeline for cross-network, real-time conversations

Created 2 years ago

Updated 1 year ago

LocalAIVoiceChat by KoljaB

Local AI voice chat for real-time conversations

Created 2 years ago

Updated 8 months ago

10x by 0xCrunchyy

In-browser voice assistant for low-latency interaction

Created 2 years ago

Updated 1 month ago

bolna by voxos-ai

Open-source platform for building voice-driven multimodal agents

Created 2 years ago

Updated 1 year ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

1 more.

WhisperFusion by collabora

AI pipeline for real-time conversations

Created 2 years ago

Updated 1 year ago

QuickAgent by gkamradt

Voice bot demo using speech and language models

Created 2 years ago

Updated 1 year ago

call-gpt by twilio-labs

Generative AI phone call toolkit using Twilio Media Streams

Created 2 years ago

Updated 1 year ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

RealtimeVoiceChat by KoljaB

Real-time voice chat with AI using streaming audio

Created 10 months ago

Updated 7 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Justin Cormack

Justin Cormack(Former CTO of Docker), and

12 more.

vocode-core by vocodedev

Open-source library for building voice-based LLM agents

Created 3 years ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

10 more.

MiniCPM-o by OpenBMB

MLLM for vision, speech, and multimodal live streaming on your phone

Created 2 years ago

Updated 2 days ago

Feedback? Help us improve.