ESP32_AI_LLM  by Explorerlowi

ESP32-based voice assistant for LLM interaction

created 1 year ago
437 stars

Top 69.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an ESP32-based smart voice assistant capable of interacting with 15 different large language models (LLMs) including ChatGPT, Claude, and Xunfei Spark. It targets hobbyists and developers looking to build custom voice-controlled AI devices with features like voice wake-up, continuous conversation, and music playback, all displayed on a small screen.

How It Works

The system leverages an ESP32 or ESP32-S3 microcontroller to manage audio input via an INMP441 microphone and audio output through a MAX98357 amplifier. Speech is processed by either the Xunfei STT (Speech-to-Text) service for online wake-up or an ASRPRO module for offline wake-up and command recognition. Recognized speech is then sent to one of the supported LLMs via WebSocket for text generation. The LLM's response is converted back to speech using Baidu's TTS (Text-to-Speech) service and displayed on a 1.8-inch RGB_TFT screen. Continuous conversation is enabled by automatically re-initiating recording after an LLM response.

Quick Start & Requirements

  • Installation: Clone the repository, configure VS Code with PlatformIO, install the ASRPRO client software, and connect hardware components.
  • Prerequisites: ESP32/ESP32-S3 development board, INMP441 microphone, MAX98357 amplifier, 1.8-inch RGB_TFT screen, ASRPRO module, Xunfei AI services (LLM and STT), and optionally other LLM API keys.
  • Setup: Requires configuring API keys in main.cpp and potentially flashing the ASRPRO module with custom wake words.
  • Resources: Access to Xunfei developer platform is mandatory.

Highlighted Details

  • Supports 15 LLMs including ChatGPT, Claude, Gemini, Grok, Mistral, and various Chinese models.
  • Features offline wake-up via ASRPRO with customizable wake words.
  • Includes web-based configuration for WiFi and LLM parameters.
  • Offers music playback from NetEase Cloud Music (non-VIP).
  • Provides "abstract entertainment" features triggered by specific voice commands.

Maintenance & Community

The project is based on Esp32_VoiceChat_LLMs. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project relies heavily on external cloud services (Xunfei, Baidu TTS, LLMs), requiring active internet connectivity and potentially incurring costs. Offline wake-up requires a separate ASRPRO module. Music playback may be affected by NetEase Cloud Music server changes or limitations. Long or English song titles may not be recognized accurately by the STT service.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
37 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.