rknn-llm  by airockchip

SDK for deploying AI models on Rockchip chips

created 1 year ago
902 stars

Top 41.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the RKLLM software stack, enabling users to deploy AI models, particularly Large Language Models (LLMs), on Rockchip NPUs. It targets developers and researchers working with Rockchip's RK3588, RK3576, and RK3562 series platforms, offering accelerated LLM inference and multimodal capabilities.

How It Works

The stack comprises RKLLM-Toolkit for PC-based model conversion and quantization, and RKLLM Runtime for on-device C/C++ API-based inference. Models are converted to an RKLLM format, then executed via a C API, leveraging the RKNPU kernel driver for hardware interaction. This approach optimizes LLM deployment on edge devices by providing a dedicated toolchain and runtime for Rockchip's NPU hardware.

Quick Start & Requirements

  • Installation: Download SDK from RKLLM_SDK or fetch code via git clone https://github.com/airockchip/rknn-toolkit2.
  • Prerequisites: Python 3.8-3.12. For Python 3.12, export BUILD_CUDA_EXT=0. Potential libomp.so issues may require manual placement from toolchains.
  • Resources: Performance benchmarks indicate varying TTFT and Tokens/s based on model size, quantization, and platform (RK3562, RK3576, RK3588). Memory usage is also detailed.
  • Links: RKLLM_SDK, rkllm_model_zoo, Examples.

Highlighted Details

  • Supports a wide range of LLMs including Llama, Qwen, Phi, ChatGLM3, Gemma, InternLM2, MiniCPM, and multimodal models like Qwen2-VL and MiniCPM-V.
  • Recent updates (v1.2.0) include custom model conversion, chat_template configuration, multi-turn dialogue, prompt cache reuse, 16K context length, GRQ Int4 quantization, GPTQ-Int8 support, and RK3562 platform compatibility.
  • Performance benchmarks are provided for various models across different Rockchip platforms, detailing Time To First Token (TTFT) and inference speed (Tokens/s).
  • Includes examples for multimodal deployment, API usage, and an API server.

Maintenance & Community

  • The project is actively maintained, with recent updates in v1.2.0.
  • RKNN Toolkit2 is introduced as an SDK for additional model deployment.

Licensing & Compatibility

  • The specific license is not explicitly stated in the provided README snippet, but it is associated with Rockchip, implying a proprietary or permissive license suitable for their hardware ecosystem. Compatibility for commercial use or closed-source linking would require verification of the explicit license terms.

Limitations & Caveats

  • The README notes potential issues with libomp.so on certain platforms, requiring manual intervention. Python 3.12 requires a specific build flag. The full scope of supported hardware beyond the listed Rockchip series is not detailed.
Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
23
Star History
150 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

JittorLLMs by Jittor

0%
2k
Low-resource LLM inference library
created 2 years ago
updated 5 months ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 2 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 5 hours ago
Feedback? Help us improve.