This repository provides the RKLLM software stack, enabling users to deploy AI models, particularly Large Language Models (LLMs), on Rockchip NPUs. It targets developers and researchers working with Rockchip's RK3588, RK3576, and RK3562 series platforms, offering accelerated LLM inference and multimodal capabilities.
How It Works
The stack comprises RKLLM-Toolkit for PC-based model conversion and quantization, and RKLLM Runtime for on-device C/C++ API-based inference. Models are converted to an RKLLM format, then executed via a C API, leveraging the RKNPU kernel driver for hardware interaction. This approach optimizes LLM deployment on edge devices by providing a dedicated toolchain and runtime for Rockchip's NPU hardware.
Quick Start & Requirements
- Installation: Download SDK from RKLLM_SDK or fetch code via
git clone https://github.com/airockchip/rknn-toolkit2
.
- Prerequisites: Python 3.8-3.12. For Python 3.12,
export BUILD_CUDA_EXT=0
. Potential libomp.so
issues may require manual placement from toolchains.
- Resources: Performance benchmarks indicate varying TTFT and Tokens/s based on model size, quantization, and platform (RK3562, RK3576, RK3588). Memory usage is also detailed.
- Links: RKLLM_SDK, rkllm_model_zoo, Examples.
Highlighted Details
- Supports a wide range of LLMs including Llama, Qwen, Phi, ChatGLM3, Gemma, InternLM2, MiniCPM, and multimodal models like Qwen2-VL and MiniCPM-V.
- Recent updates (v1.2.0) include custom model conversion, chat_template configuration, multi-turn dialogue, prompt cache reuse, 16K context length, GRQ Int4 quantization, GPTQ-Int8 support, and RK3562 platform compatibility.
- Performance benchmarks are provided for various models across different Rockchip platforms, detailing Time To First Token (TTFT) and inference speed (Tokens/s).
- Includes examples for multimodal deployment, API usage, and an API server.
Maintenance & Community
- The project is actively maintained, with recent updates in v1.2.0.
- RKNN Toolkit2 is introduced as an SDK for additional model deployment.
Licensing & Compatibility
- The specific license is not explicitly stated in the provided README snippet, but it is associated with Rockchip, implying a proprietary or permissive license suitable for their hardware ecosystem. Compatibility for commercial use or closed-source linking would require verification of the explicit license terms.
Limitations & Caveats
- The README notes potential issues with
libomp.so
on certain platforms, requiring manual intervention. Python 3.12 requires a specific build flag. The full scope of supported hardware beyond the listed Rockchip series is not detailed.