esp32-llm by DaveBben

LLM inference on ESP32

Created 1 year ago

452 stars

Top 66.6% on SourcePulse

Project Summary

This project demonstrates the feasibility of running a small Large Language Model (LLM) on an ESP32 microcontroller, targeting embedded developers and hobbyists interested in edge AI. It achieves this by optimizing the llama.cpp implementation for the ESP32's architecture, enabling on-device inference for basic natural language tasks.

How It Works

The project leverages a 260K parameter tinyllamas model, specifically trained on the Tiny Stories dataset. The core implementation is based on llama.cpp, with modifications to utilize the ESP32-S3's dual-core processing capabilities and its ESP-DSP library for optimized dot product operations, including SIMD instructions. These optimizations, combined with maxed-out CPU (240 MHz) and PSRAM (80 MHz) speeds, allow for inference speeds of approximately 19.13 tokens/second.

Quick Start & Requirements

Install the ESP-IDF toolchain.
Build: idf.py build
Flash: idf.py -p /dev/{DEVICE_PORT} flash
Requires ESP32-S3FH4R2 with 2MB PSRAM due to the 1MB RAM requirement of the model.

Highlighted Details

Achieves ~19.13 tokens/second inference speed.
Utilizes ESP32-S3 dual-core processing.
Leverages ESP-DSP library for optimized dot product functions and SIMD instructions.

Maintenance & Community

No information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The project appears to be based on llama.cpp, which is typically released under a permissive license like MIT. However, the specific license for this fork is not explicitly stated in the README.

Limitations & Caveats

The project explicitly states that while possible, running an LLM on the ESP32 is "probably not very useful" due to the extremely small model size. The hardware requirements (ESP32-S3 with 2MB PSRAM) are specific.

esp32-llm by DaveBben

Explore Similar Projects

KVSplit by dipampaul17

ollama-benchmark by aidatatools

Nanoflow by efeslab

sarathi-serve by microsoft

prima.cpp by Lizonghang

llama.go by gotzmann

marlin by IST-DASLab

LiteRT-LM by google-ai-edge

vidur by microsoft

distributed-llama by b4rtaz

MiniCPM by OpenBMB

ipex-llm by intel