mllm  by UbiquitousLearning

Mobile inference engine for multimodal LLMs

created 1 year ago
980 stars

Top 38.5% on sourcepulse

GitHubView on GitHub
Project Summary

mllm is a C/C++ inference engine designed for fast, lightweight multimodal Large Language Models (LLMs) on mobile and edge devices. It targets researchers and developers building on-device AI applications, offering optimized performance across various hardware including ARM and x86 CPUs, and Qualcomm NPUs.

How It Works

mllm is implemented in plain C/C++ with minimal dependencies, enabling broad compatibility and efficient compilation. It supports advanced features like quantization (INT4, INT8) and hardware acceleration via ARM NEON, x86 AVX2, and Qualcomm's Hexagon NPU (QNN). This approach allows for significant performance gains and reduced memory footprints, crucial for resource-constrained mobile environments.

Quick Start & Requirements

Highlighted Details

  • Supports a wide range of LLMs including LLaMA, Mistral, Qwen, Phi-3, and multimodal models like LLaVA and Qwen2-VL.
  • Offers hardware acceleration for Qualcomm NPUs (QNN) with preliminary support for end-to-end inference.
  • Includes model conversion and quantization tools for custom model deployment.
  • Demonstrates end-to-end functionality with an Android application.

Maintenance & Community

The project is actively updated with new model support (e.g., Phi-3 Vision, MiniCPM). It originates from research groups at BUPT and PKU.

Licensing & Compatibility

The core project is licensed under the MIT License. Some components (e.g., wenet) are licensed under Apache 2.0. This generally permits commercial use and linking with closed-source applications.

Limitations & Caveats

The QNN backend is noted as preliminary and under active development. macOS builds may experience slower performance due to OpenMP limitations with Apple's LLVM compiler.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
43
Issues (30d)
31
Star History
152 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 15 hours ago
Feedback? Help us improve.