llama4micro by maxbbraun

LLM inference on a microcontroller

Created 2 years ago

547 stars

Top 58.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Luis Capelo

Cofounder of Lightning AI

Project Summary

This project demonstrates running a "large" language model on a microcontroller, specifically the Coral Dev Board Micro with 64MB of RAM. It targets embedded systems developers and researchers interested in pushing the boundaries of on-device AI, enabling generative text capabilities in resource-constrained environments.

How It Works

The project adapts the llama2.c implementation and tinyllamas checkpoints, trained on the TinyStories dataset, to run on the Coral Dev Board Micro's 800 MHz Arm Cortex-M7 CPU. For image input, it leverages the board's Edge TPU with a compiled YOLOv5 model for object detection. The detected object forms the initial prompt for the LLM, generating text output streamed via serial.

Quick Start & Requirements

Install: Clone repo with submodules, build with cmake and make, flash using python3 -m venv venv and python ../coralmicro/scripts/flashtool.py.
Prerequisites: Coral Dev Board Micro, FreeRTOS toolchain, Python 3.x for flashing.
Setup: Model conversion and flashing required.

Highlighted Details

LLM inference on an 800 MHz Arm Cortex-M7 CPU.
Camera image classification via Edge TPU and YOLOv5.
Generates text at ~2.5 tokens per second.
Model loading takes ~7 seconds on startup.

Maintenance & Community

The project is a personal endeavor by maxbbraun. No specific community channels or roadmap are indicated in the README.

Licensing & Compatibility

The project itself appears to be MIT licensed, but it incorporates submodules from other projects (llama2.c, coralmicro, yolov5) which may have different licenses. Compatibility for commercial use depends on the licenses of these submodules.

Limitations & Caveats

The quality of the generated stories from the smaller model versions is described as "not ideal" but "somewhat coherent." The second Arm Cortex-M4 CPU core on the board is currently unused.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days