mllm by UbiquitousLearning

Mobile inference engine for multimodal LLMs

Created 2 years ago

1,332 stars

Top 30.0% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

mllm is a C/C++ inference engine designed for fast, lightweight multimodal Large Language Models (LLMs) on mobile and edge devices. It targets researchers and developers building on-device AI applications, offering optimized performance across various hardware including ARM and x86 CPUs, and Qualcomm NPUs.

How It Works

mllm is implemented in plain C/C++ with minimal dependencies, enabling broad compatibility and efficient compilation. It supports advanced features like quantization (INT4, INT8) and hardware acceleration via ARM NEON, x86 AVX2, and Qualcomm's Hexagon NPU (QNN). This approach allows for significant performance gains and reduced memory footprints, crucial for resource-constrained mobile environments.

Quick Start & Requirements

Install/Run: Build via provided scripts (build_android.sh, build_qnn_android.sh, build.sh). Requires GCC (11.4+)/Clang (11.0+), CMake (3.18+), and Android NDK (26+).
Hardware: Recommended for Linux builds. Android demos require specific NPU support (e.g., Snapdragon 8 Gen3 for QNN) and sufficient RAM (12GB+ for LLaMA-2-7B, 16GB+ for Qwen-1.5-1.8B-Chat).
Docs: https://ubiquitouslearning.github.io/mllm_website/introduction/getstarted/
Android Demo: https://github.com/lx200916/ChatBotApp/

Highlighted Details

Supports a wide range of LLMs including LLaMA, Mistral, Qwen, Phi-3, and multimodal models like LLaVA and Qwen2-VL.
Offers hardware acceleration for Qualcomm NPUs (QNN) with preliminary support for end-to-end inference.
Includes model conversion and quantization tools for custom model deployment.
Demonstrates end-to-end functionality with an Android application.

Maintenance & Community

The project is actively updated with new model support (e.g., Phi-3 Vision, MiniCPM). It originates from research groups at BUPT and PKU.

Licensing & Compatibility

The core project is licensed under the MIT License. Some components (e.g., wenet) are licensed under Apache 2.0. This generally permits commercial use and linking with closed-source applications.

Limitations & Caveats

The QNN backend is noted as preliminary and under active development. macOS builds may experience slower performance due to OpenMP limitations with Apple's LLVM compiler.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

69 stars in the last 30 days