mllm  by UbiquitousLearning

Mobile inference engine for multimodal LLMs

Created 2 years ago
1,045 stars

Top 36.0% on SourcePulse

GitHubView on GitHub
Project Summary

mllm is a C/C++ inference engine designed for fast, lightweight multimodal Large Language Models (LLMs) on mobile and edge devices. It targets researchers and developers building on-device AI applications, offering optimized performance across various hardware including ARM and x86 CPUs, and Qualcomm NPUs.

How It Works

mllm is implemented in plain C/C++ with minimal dependencies, enabling broad compatibility and efficient compilation. It supports advanced features like quantization (INT4, INT8) and hardware acceleration via ARM NEON, x86 AVX2, and Qualcomm's Hexagon NPU (QNN). This approach allows for significant performance gains and reduced memory footprints, crucial for resource-constrained mobile environments.

Quick Start & Requirements

Highlighted Details

  • Supports a wide range of LLMs including LLaMA, Mistral, Qwen, Phi-3, and multimodal models like LLaVA and Qwen2-VL.
  • Offers hardware acceleration for Qualcomm NPUs (QNN) with preliminary support for end-to-end inference.
  • Includes model conversion and quantization tools for custom model deployment.
  • Demonstrates end-to-end functionality with an Android application.

Maintenance & Community

The project is actively updated with new model support (e.g., Phi-3 Vision, MiniCPM). It originates from research groups at BUPT and PKU.

Licensing & Compatibility

The core project is licensed under the MIT License. Some components (e.g., wenet) are licensed under Apache 2.0. This generally permits commercial use and linking with closed-source applications.

Limitations & Caveats

The QNN backend is noted as preliminary and under active development. macOS builds may experience slower performance due to OpenMP limitations with Apple's LLVM compiler.

Health Check
Last Commit

15 hours ago

Responsiveness

1 day

Pull Requests (30d)
65
Issues (30d)
5
Star History
45 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Georgi Gerganov Georgi Gerganov(Author of llama.cpp, whisper.cpp), and
1 more.

LLMFarm by guinmoon

0.4%
2k
iOS/MacOS app for local LLM inference
Created 2 years ago
Updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
2 more.

torchchat by pytorch

0.1%
4k
PyTorch-native SDK for local LLM inference across diverse platforms
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.