rkllama  by NotPunchnox

LLM server and client for Rockchip NPUs

Created 8 months ago
286 stars

Top 91.6% on SourcePulse

GitHubView on GitHub
Project Summary

RKLLama provides an Ollama-like server and client for running Large Language Models (LLMs) on Rockchip RK3588/3576 NPUs. It targets developers and users of these embedded systems, enabling efficient on-device AI inference with NPU acceleration.

How It Works

RKLLama leverages the rkllm-runtime library (V 1.2.1b1) to execute LLMs directly on the Rockchip NPU, offering a significant performance advantage over CPU-only inference. It features a REST API server (app.py) and a command-line client (client.py) for model interaction, supporting partial Ollama API compatibility for seamless integration with existing tools.

Quick Start & Requirements

  • Installation: Clone the repository and run ./setup.sh for standard installation, or use docker pull ghcr.io/notpunchnox/rkllama:main and docker run --privileged -p 8080:8080 ghcr.io/notpunchnox/rkllama:main for Docker.
  • Hardware: Rockchip RK3588(S) or RK3576 devices (e.g., Orange Pi 5 series) with 16GB RAM recommended.
  • OS: Ubuntu 24.04 arm64 or Armbian Linux 6.1.99-vendor-rk35xx.
  • Dependencies: Python 3.8-3.12.
  • Docs: API REST Documentation

Highlighted Details

  • NPU acceleration for LLM inference on Rockchip platforms.
  • Partial Ollama API compatibility ( /api/chat, /api/generate ).
  • Full support for tool/function calling with multiple LLM formats.
  • Direct model pulling from Hugging Face with simplified naming.

Maintenance & Community

Active development with contributions from ichlaffterlalu, TomJacobsUK, and Yoann Vanitou. Upcoming features include OpenAI API compatibility and multimodal model support.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is at version 0.0.42 and lists several upcoming features, indicating ongoing development. Ollama API compatibility is partial, and conversion tools for GGUF/HF models are noted as "coming soon."

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
2
Star History
19 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Shawn Wang Shawn Wang(Editor of Latent Space), and
5 more.

ollama-js by ollama

0.3%
4k
JS SDK for Ollama
Created 2 years ago
Updated 21 hours ago
Feedback? Help us improve.