This project demonstrates the feasibility of running a small Large Language Model (LLM) on an ESP32 microcontroller, targeting embedded developers and hobbyists interested in edge AI. It achieves this by optimizing the llama.cpp
implementation for the ESP32's architecture, enabling on-device inference for basic natural language tasks.
How It Works
The project leverages a 260K parameter tinyllamas
model, specifically trained on the Tiny Stories dataset. The core implementation is based on llama.cpp
, with modifications to utilize the ESP32-S3's dual-core processing capabilities and its ESP-DSP library for optimized dot product operations, including SIMD instructions. These optimizations, combined with maxed-out CPU (240 MHz) and PSRAM (80 MHz) speeds, allow for inference speeds of approximately 19.13 tokens/second.
Quick Start & Requirements
idf.py build
idf.py -p /dev/{DEVICE_PORT} flash
Highlighted Details
Maintenance & Community
No information on maintainers, community channels, or roadmap is provided in the README.
Licensing & Compatibility
The project appears to be based on llama.cpp
, which is typically released under a permissive license like MIT. However, the specific license for this fork is not explicitly stated in the README.
Limitations & Caveats
The project explicitly states that while possible, running an LLM on the ESP32 is "probably not very useful" due to the extremely small model size. The hardware requirements (ESP32-S3 with 2MB PSRAM) are specific.
11 months ago
Inactive