KuiperInfer by zjhellofss

Deep learning inference library for model deployment

Created 3 years ago

3,278 stars

Top 14.6% on SourcePulse

Project Summary

KuiperInfer is an open-source educational project that guides users through building a high-performance deep learning inference library from scratch. It targets students and developers interested in understanding the internals of deep learning frameworks, enhancing their C++ skills, and improving their performance in technical interviews. The project offers a comprehensive video course and accompanying code to achieve these goals.

How It Works

KuiperInfer implements a deep learning inference engine using modern C++ (C++17/20) with a focus on modular design and extensibility. It employs a computational graph approach, allowing for efficient execution of neural network operations. The library supports both CPU (via Armadillo and OpenBLAS/MKL) and CUDA backends, with an emphasis on implementing core operators like convolution, pooling, and activation functions. The project also incorporates techniques for performance optimization and provides a clear structure for managing project dependencies and testing.

Quick Start & Requirements

Docker: docker pull registry.cn-hangzhou.aliyuncs.com/hellofss/kuiperinfer:latest followed by docker run -it registry.cn-ko.aliyuncs.com/hellofss/kuiperinfer:latest /bin/bash.
Prerequisites: C++17 compiler, CMake, Armadillo, OpenBLAS/MKL, OpenMP, Google Test, Google Benchmark. CUDA is required for GPU acceleration.
Setup: Building from source requires compiling dependencies like Armadillo and Google Test/Benchmark. Docker provides a pre-configured environment.
Resources: Model weights and parameters need to be downloaded separately.
Documentation: Video course links are provided for detailed explanations: https://space.bilibili.com/1822828582

Highlighted Details

Comprehensive video course covering framework design, operator implementation, and model integration.
Supports popular models like Llama (including Llama 3.2), Qwen 2.5, Unet, Yolov5, and Resnet.
Features both CPU (Armadillo + OpenBLAS) and CUDA backends, with Int8 quantization support for large models.
Includes performance benchmarks for various models on CPU, demonstrating efficiency.

Maintenance & Community

The project is actively developed by zjhellofss and has contributions from a list of named individuals. Community engagement is encouraged through Bilibili updates and a WeChat group for course participants.

Licensing & Compatibility

KuiperInfer's code is primarily licensed under the MIT license. However, it acknowledges borrowing code from NCNN, which is under the BSD license, and states that NCNN's license is retained in the borrowed code. This dual licensing requires careful review for commercial use.

Limitations & Caveats

The project is presented as an educational tool, and while it supports several models, its primary focus is on teaching the development process rather than being a production-ready, fully optimized inference engine. Some CUDA implementations might be less mature than CPU counterparts. The licensing requires careful consideration for commercial applications.

KuiperInfer by zjhellofss

Explore Similar Projects

fp6_llm by usyd-fsalab

calm by zeux

Omega-AI by dromara

kuiperdatawhale by zjhellofss

InferLLM by MegEngine

KuiperLLama by zjhellofss

bolt by huawei-noah

nncase by kendryte

cv_note by harleyszhang

oneflow by Oneflow-Inc

ggml by ggml-org

openvino by openvinotoolkit