ESP32_AI_LLM by Explorerlowi

ESP32-based voice assistant for LLM interaction

Created 1 year ago

475 stars

Top 64.2% on SourcePulse

Project Summary

This project provides an ESP32-based smart voice assistant capable of interacting with 15 different large language models (LLMs) including ChatGPT, Claude, and Xunfei Spark. It targets hobbyists and developers looking to build custom voice-controlled AI devices with features like voice wake-up, continuous conversation, and music playback, all displayed on a small screen.

How It Works

The system leverages an ESP32 or ESP32-S3 microcontroller to manage audio input via an INMP441 microphone and audio output through a MAX98357 amplifier. Speech is processed by either the Xunfei STT (Speech-to-Text) service for online wake-up or an ASRPRO module for offline wake-up and command recognition. Recognized speech is then sent to one of the supported LLMs via WebSocket for text generation. The LLM's response is converted back to speech using Baidu's TTS (Text-to-Speech) service and displayed on a 1.8-inch RGB_TFT screen. Continuous conversation is enabled by automatically re-initiating recording after an LLM response.

Quick Start & Requirements

Installation: Clone the repository, configure VS Code with PlatformIO, install the ASRPRO client software, and connect hardware components.
Prerequisites: ESP32/ESP32-S3 development board, INMP441 microphone, MAX98357 amplifier, 1.8-inch RGB_TFT screen, ASRPRO module, Xunfei AI services (LLM and STT), and optionally other LLM API keys.
Setup: Requires configuring API keys in main.cpp and potentially flashing the ASRPRO module with custom wake words.
Resources: Access to Xunfei developer platform is mandatory.

Highlighted Details

Supports 15 LLMs including ChatGPT, Claude, Gemini, Grok, Mistral, and various Chinese models.
Features offline wake-up via ASRPRO with customizable wake words.
Includes web-based configuration for WiFi and LLM parameters.
Offers music playback from NetEase Cloud Music (non-VIP).
Provides "abstract entertainment" features triggered by specific voice commands.

Maintenance & Community

The project is based on Esp32_VoiceChat_LLMs. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies heavily on external cloud services (Xunfei, Baidu TTS, LLMs), requiring active internet connectivity and potentially incurring costs. Offline wake-up requires a separate ASRPRO module. Music playback may be affected by NetEase Cloud Music server changes or limitations. Long or English song titles may not be recognized accurately by the STT service.

ESP32_AI_LLM by Explorerlowi

Explore Similar Projects

ChatGPT-OpenAI-Smart-Speaker by Olney1

pi-card by nkasmanoff

gpt-voice-conversation-chatbot by Adri6336

ChatWaifuL2D by cjyaddone

echokit_server by second-state

onju-voice by justLV

ChatdollKit by uezo

hertz-dev by Standard-Intelligence

bolna by bolna-ai

Open-LLM-VTuber by Open-LLM-VTuber

mi-gpt by idootop

xiaozhi-esp32 by 78