Mobile inference engine for multimodal LLMs
Top 38.5% on sourcepulse
mllm is a C/C++ inference engine designed for fast, lightweight multimodal Large Language Models (LLMs) on mobile and edge devices. It targets researchers and developers building on-device AI applications, offering optimized performance across various hardware including ARM and x86 CPUs, and Qualcomm NPUs.
How It Works
mllm is implemented in plain C/C++ with minimal dependencies, enabling broad compatibility and efficient compilation. It supports advanced features like quantization (INT4, INT8) and hardware acceleration via ARM NEON, x86 AVX2, and Qualcomm's Hexagon NPU (QNN). This approach allows for significant performance gains and reduced memory footprints, crucial for resource-constrained mobile environments.
Quick Start & Requirements
build_android.sh
, build_qnn_android.sh
, build.sh
). Requires GCC (11.4+)/Clang (11.0+), CMake (3.18+), and Android NDK (26+).Highlighted Details
Maintenance & Community
The project is actively updated with new model support (e.g., Phi-3 Vision, MiniCPM). It originates from research groups at BUPT and PKU.
Licensing & Compatibility
The core project is licensed under the MIT License. Some components (e.g., wenet) are licensed under Apache 2.0. This generally permits commercial use and linking with closed-source applications.
Limitations & Caveats
The QNN backend is noted as preliminary and under active development. macOS builds may experience slower performance due to OpenMP limitations with Apple's LLVM compiler.
1 day ago
1 day