ElatoAI by akdeb

Realtime speech AI agents for ESP32 devices

Created 9 months ago

1,298 stars

Top 30.6% on SourcePulse

2 Experts Love This Project

luiscape

Cofounder of Lightning AI

longouyang

Research Scientist at OpenAI

Project Summary

ElatoAI provides a framework for building real-time AI speech agents on ESP32 microcontrollers, targeting hobbyists and developers creating AI companions or toys. It enables extended, uninterrupted conversations by integrating OpenAI's Realtime API, Deno edge functions, and secure WebSockets.

How It Works

The system comprises a Next.js frontend for AI agent management, Deno edge functions for handling WebSocket connections and API calls, and an ESP32 client for audio processing and communication. Speech is captured by the ESP32, sent via secure WebSockets to Deno edge functions, processed by OpenAI's Realtime API, and the AI's response is streamed back to the ESP32 for playback. This architecture leverages edge computing for low latency and Opus compression for efficient audio streaming.

Quick Start & Requirements

Local Setup: Requires Supabase CLI for local backend, Node.js (v22.13.0), Next.js (v14.2.7), React (v18.2.0), and Deno.
ESP32: ESP32-S3 DevKitC-1 board with PlatformIO and Arduino framework.
Dependencies: OpenAI API key, Supabase API key.
Setup: Local Supabase start, frontend npm install and npm run dev, Deno server deno run -A --env-file=.env main.ts. ESP32 firmware upload and Wi-Fi configuration via captive portal.
Links: Demo Video, Homepage, Frontend README, Deno Server README, ESP32 Device README.

Highlighted Details

Real-time speech-to-speech with OpenAI's Realtime APIs.
Global low-latency performance via Deno Edge Functions.
Supports custom AI agents, voices, and conversation history.
ESP32 client requires no PSRAM and includes Wi-Fi management via captive portal.
Over-the-Air (OTA) updates for ESP32 firmware.

Maintenance & Community

Active development with contributions welcomed.
Discord community available: https://discord.gg/KJWxDPBRUj.

Licensing & Compatibility

Licensed under the MIT License.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

Exhibits a 3-4 second cold start time.
Uninterrupted conversations are limited to 10 minutes.
The edge server stops processing after the wall clock time limit is exceeded.
No speech interruption detection is implemented on the ESP32.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

39 stars in the last 30 days

Explore Similar Projects

alibabacloud-bailian-speech-demo by aliyun

Speech AI SDK demos for AlibabaCloud Bailian

Created 1 year ago

Updated 3 weeks ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

S.A.T.U.R.D.A.Y by GRVYDEV

Vocal computing toolbox for building voice interfaces to LLMs

Created 2 years ago

Updated 2 years ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

pi-card by nkasmanoff

Voice assistant for Raspberry Pi

Created 1 year ago

Updated 1 year ago

gpt-voice-conversation-chatbot by Adri6336

Voice chatbot for engaging spoken conversations with ChatGPT/GPT-4

Created 2 years ago

Updated 1 year ago

AIVoiceChat by KoljaB

Voice chat for low-latency AI companion interaction

Created 2 years ago

Updated 6 months ago

LocalAIVoiceChat by KoljaB

Local AI voice chat for real-time conversations

Created 2 years ago

Updated 6 months ago

swift-realtime-openai by m1guelpf

Swift SDK for OpenAI's Realtime API, enabling multimodal conversations

Created 1 year ago

Updated 3 months ago

free4chat by i365dev

Real-time audio chat service emphasizing local-first and privacy

Created 4 years ago

Updated 10 months ago

speech-assistant-openai-realtime-api-node by twilio-samples

Node.js app for AI speech assistant using Twilio Voice and OpenAI

Created 1 year ago

Updated 4 months ago

bolna by bolna-ai

Voice AI agents platform for building conversational apps

Created 1 year ago

Updated 1 day ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

RealtimeVoiceChat by KoljaB

Real-time voice chat with AI using streaming audio

Created 8 months ago

Updated 6 months ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Nir Gazit

Nir Gazit(Cofounder of Traceloop), and

4 more.

pipecat by pipecat-ai

Open-source framework for building real-time voice and multimodal conversational AI agents

Created 2 years ago

Updated 1 day ago

Feedback? Help us improve.