rknn-llm by airockchip

SDK for deploying AI models on Rockchip chips

Created 1 year ago

1,172 stars

Top 33.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

This repository provides the RKLLM software stack, enabling users to deploy AI models, particularly Large Language Models (LLMs), on Rockchip NPUs. It targets developers and researchers working with Rockchip's RK3588, RK3576, and RK3562 series platforms, offering accelerated LLM inference and multimodal capabilities.

How It Works

The stack comprises RKLLM-Toolkit for PC-based model conversion and quantization, and RKLLM Runtime for on-device C/C++ API-based inference. Models are converted to an RKLLM format, then executed via a C API, leveraging the RKNPU kernel driver for hardware interaction. This approach optimizes LLM deployment on edge devices by providing a dedicated toolchain and runtime for Rockchip's NPU hardware.

Quick Start & Requirements

Installation: Download SDK from RKLLM_SDK or fetch code via git clone https://github.com/airockchip/rknn-toolkit2.
Prerequisites: Python 3.8-3.12. For Python 3.12, export BUILD_CUDA_EXT=0. Potential libomp.so issues may require manual placement from toolchains.
Resources: Performance benchmarks indicate varying TTFT and Tokens/s based on model size, quantization, and platform (RK3562, RK3576, RK3588). Memory usage is also detailed.
Links: RKLLM_SDK, rkllm_model_zoo, Examples.

Highlighted Details

Supports a wide range of LLMs including Llama, Qwen, Phi, ChatGLM3, Gemma, InternLM2, MiniCPM, and multimodal models like Qwen2-VL and MiniCPM-V.
Recent updates (v1.2.0) include custom model conversion, chat_template configuration, multi-turn dialogue, prompt cache reuse, 16K context length, GRQ Int4 quantization, GPTQ-Int8 support, and RK3562 platform compatibility.
Performance benchmarks are provided for various models across different Rockchip platforms, detailing Time To First Token (TTFT) and inference speed (Tokens/s).
Includes examples for multimodal deployment, API usage, and an API server.

Maintenance & Community

The project is actively maintained, with recent updates in v1.2.0.
RKNN Toolkit2 is introduced as an SDK for additional model deployment.

Licensing & Compatibility

The specific license is not explicitly stated in the provided README snippet, but it is associated with Rockchip, implying a proprietary or permissive license suitable for their hardware ecosystem. Compatibility for commercial use or closed-source linking would require verification of the explicit license terms.

Limitations & Caveats

The README notes potential issues with libomp.so on certain platforms, requiring manual intervention. Python 3.12 requires a specific build flag. The full scope of supported hardware beyond the listed Rockchip series is not detailed.

Health Check

Last Commit

1 month ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

51 stars in the last 30 days