rkllama by NotPunchnox

LLM server and client for Rockchip NPUs

Created 1 year ago

381 stars

Top 74.9% on SourcePulse

Project Summary

RKLLama provides an Ollama-like server and client for running Large Language Models (LLMs) on Rockchip RK3588/3576 NPUs. It targets developers and users of these embedded systems, enabling efficient on-device AI inference with NPU acceleration.

How It Works

RKLLama leverages the rkllm-runtime library (V 1.2.1b1) to execute LLMs directly on the Rockchip NPU, offering a significant performance advantage over CPU-only inference. It features a REST API server (app.py) and a command-line client (client.py) for model interaction, supporting partial Ollama API compatibility for seamless integration with existing tools.

Quick Start & Requirements

Installation: Clone the repository and run ./setup.sh for standard installation, or use docker pull ghcr.io/notpunchnox/rkllama:main and docker run --privileged -p 8080:8080 ghcr.io/notpunchnox/rkllama:main for Docker.
Hardware: Rockchip RK3588(S) or RK3576 devices (e.g., Orange Pi 5 series) with 16GB RAM recommended.
OS: Ubuntu 24.04 arm64 or Armbian Linux 6.1.99-vendor-rk35xx.
Dependencies: Python 3.8-3.12.
Docs: API REST Documentation

Highlighted Details

NPU acceleration for LLM inference on Rockchip platforms.
Partial Ollama API compatibility ( /api/chat, /api/generate ).
Full support for tool/function calling with multiple LLM formats.
Direct model pulling from Hugging Face with simplified naming.

Maintenance & Community

Active development with contributions from ichlaffterlalu, TomJacobsUK, and Yoann Vanitou. Upcoming features include OpenAI API compatibility and multimodal model support.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is at version 0.0.42 and lists several upcoming features, indicating ongoing development. Ollama API compatibility is partial, and conversion tools for GGUF/HF models are noted as "coming soon."

rkllama by NotPunchnox

Explore Similar Projects

llm-ollama by taketwo

simpleAI by lhenault

ollama-ai-provider by sgomez

token.js by token-js

GenossGPT by theodo-group

open-assistant-api by MLT-OSS

ai-sdk-provider by OpenRouterTeam

uni-api by yym68686

lemonade by lemonade-sdk

OllamaSharp by awaescher

ollama-js by ollama

CLIProxyAPI by router-for-me