ElatoAI  by akdeb

Realtime speech AI agents for ESP32 devices

created 3 months ago
1,095 stars

Top 35.4% on sourcepulse

GitHubView on GitHub
Project Summary

ElatoAI provides a framework for building real-time AI speech agents on ESP32 microcontrollers, targeting hobbyists and developers creating AI companions or toys. It enables extended, uninterrupted conversations by integrating OpenAI's Realtime API, Deno edge functions, and secure WebSockets.

How It Works

The system comprises a Next.js frontend for AI agent management, Deno edge functions for handling WebSocket connections and API calls, and an ESP32 client for audio processing and communication. Speech is captured by the ESP32, sent via secure WebSockets to Deno edge functions, processed by OpenAI's Realtime API, and the AI's response is streamed back to the ESP32 for playback. This architecture leverages edge computing for low latency and Opus compression for efficient audio streaming.

Quick Start & Requirements

  • Local Setup: Requires Supabase CLI for local backend, Node.js (v22.13.0), Next.js (v14.2.7), React (v18.2.0), and Deno.
  • ESP32: ESP32-S3 DevKitC-1 board with PlatformIO and Arduino framework.
  • Dependencies: OpenAI API key, Supabase API key.
  • Setup: Local Supabase start, frontend npm install and npm run dev, Deno server deno run -A --env-file=.env main.ts. ESP32 firmware upload and Wi-Fi configuration via captive portal.
  • Links: Demo Video, Homepage, Frontend README, Deno Server README, ESP32 Device README.

Highlighted Details

  • Real-time speech-to-speech with OpenAI's Realtime APIs.
  • Global low-latency performance via Deno Edge Functions.
  • Supports custom AI agents, voices, and conversation history.
  • ESP32 client requires no PSRAM and includes Wi-Fi management via captive portal.
  • Over-the-Air (OTA) updates for ESP32 firmware.

Maintenance & Community

Licensing & Compatibility

  • Licensed under the MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Exhibits a 3-4 second cold start time.
  • Uninterrupted conversations are limited to 10 minutes.
  • The edge server stops processing after the wall clock time limit is exceeded.
  • No speech interruption detection is implemented on the ESP32.
Health Check
Last commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
186 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.