ESP32-based voice assistant for LLM interaction
Top 69.3% on sourcepulse
This project provides an ESP32-based smart voice assistant capable of interacting with 15 different large language models (LLMs) including ChatGPT, Claude, and Xunfei Spark. It targets hobbyists and developers looking to build custom voice-controlled AI devices with features like voice wake-up, continuous conversation, and music playback, all displayed on a small screen.
How It Works
The system leverages an ESP32 or ESP32-S3 microcontroller to manage audio input via an INMP441 microphone and audio output through a MAX98357 amplifier. Speech is processed by either the Xunfei STT (Speech-to-Text) service for online wake-up or an ASRPRO module for offline wake-up and command recognition. Recognized speech is then sent to one of the supported LLMs via WebSocket for text generation. The LLM's response is converted back to speech using Baidu's TTS (Text-to-Speech) service and displayed on a 1.8-inch RGB_TFT screen. Continuous conversation is enabled by automatically re-initiating recording after an LLM response.
Quick Start & Requirements
main.cpp
and potentially flashing the ASRPRO module with custom wake words.Highlighted Details
Maintenance & Community
The project is based on Esp32_VoiceChat_LLMs. Further community or maintenance details are not explicitly provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project relies heavily on external cloud services (Xunfei, Baidu TTS, LLMs), requiring active internet connectivity and potentially incurring costs. Offline wake-up requires a separate ASRPRO module. Music playback may be affected by NetEase Cloud Music server changes or limitations. Long or English song titles may not be recognized accurately by the STT service.
7 months ago
1 day