Discover and explore top open-source AI tools and projects—updated daily.
LLM server and client for Rockchip NPUs
Top 91.6% on SourcePulse
RKLLama provides an Ollama-like server and client for running Large Language Models (LLMs) on Rockchip RK3588/3576 NPUs. It targets developers and users of these embedded systems, enabling efficient on-device AI inference with NPU acceleration.
How It Works
RKLLama leverages the rkllm-runtime
library (V 1.2.1b1) to execute LLMs directly on the Rockchip NPU, offering a significant performance advantage over CPU-only inference. It features a REST API server (app.py
) and a command-line client (client.py
) for model interaction, supporting partial Ollama API compatibility for seamless integration with existing tools.
Quick Start & Requirements
./setup.sh
for standard installation, or use docker pull ghcr.io/notpunchnox/rkllama:main
and docker run --privileged -p 8080:8080 ghcr.io/notpunchnox/rkllama:main
for Docker.Highlighted Details
/api/chat
, /api/generate
).Maintenance & Community
Active development with contributions from ichlaffterlalu, TomJacobsUK, and Yoann Vanitou. Upcoming features include OpenAI API compatibility and multimodal model support.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is at version 0.0.42 and lists several upcoming features, indicating ongoing development. Ollama API compatibility is partial, and conversion tools for GGUF/HF models are noted as "coming soon."
1 week ago
Inactive