llama4micro  by maxbbraun

LLM inference on a microcontroller

Created 1 year ago
533 stars

Top 59.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project demonstrates running a "large" language model on a microcontroller, specifically the Coral Dev Board Micro with 64MB of RAM. It targets embedded systems developers and researchers interested in pushing the boundaries of on-device AI, enabling generative text capabilities in resource-constrained environments.

How It Works

The project adapts the llama2.c implementation and tinyllamas checkpoints, trained on the TinyStories dataset, to run on the Coral Dev Board Micro's 800 MHz Arm Cortex-M7 CPU. For image input, it leverages the board's Edge TPU with a compiled YOLOv5 model for object detection. The detected object forms the initial prompt for the LLM, generating text output streamed via serial.

Quick Start & Requirements

  • Install: Clone repo with submodules, build with cmake and make, flash using python3 -m venv venv and python ../coralmicro/scripts/flashtool.py.
  • Prerequisites: Coral Dev Board Micro, FreeRTOS toolchain, Python 3.x for flashing.
  • Setup: Model conversion and flashing required.

Highlighted Details

  • LLM inference on an 800 MHz Arm Cortex-M7 CPU.
  • Camera image classification via Edge TPU and YOLOv5.
  • Generates text at ~2.5 tokens per second.
  • Model loading takes ~7 seconds on startup.

Maintenance & Community

The project is a personal endeavor by maxbbraun. No specific community channels or roadmap are indicated in the README.

Licensing & Compatibility

The project itself appears to be MIT licensed, but it incorporates submodules from other projects (llama2.c, coralmicro, yolov5) which may have different licenses. Compatibility for commercial use depends on the licenses of these submodules.

Limitations & Caveats

The quality of the generated stories from the smaller model versions is described as "not ideal" but "somewhat coherent." The second Arm Cortex-M4 CPU core on the board is currently unused.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Georgi Gerganov Georgi Gerganov(Author of llama.cpp, whisper.cpp), and
1 more.

LLMFarm by guinmoon

0.3%
2k
iOS/MacOS app for local LLM inference
Created 2 years ago
Updated 3 weeks ago
Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
5 more.

streaming-llm by mit-han-lab

0.1%
7k
Framework for efficient LLM streaming
Created 1 year ago
Updated 1 year ago
Feedback? Help us improve.