LLM deploy project based on MNN
Top 26.8% on sourcepulse
This project provides a framework for deploying Large Language Models (LLMs) using the MNN inference engine. It targets developers and researchers looking to run LLMs efficiently on various platforms, including mobile and desktop, with support for CPU and GPU acceleration.
How It Works
The project leverages the MNN inference engine for optimized LLM execution. It includes pre-built examples for command-line interfaces (CLI), web UIs, Android applications, and Python bindings. Model conversion and export tools are also provided, enabling users to convert ONNX or other formats to MNN. The architecture supports various hardware backends through MNN compilation flags, including CUDA and Metal.
Quick Start & Requirements
git clone --recurse-submodules https://github.com/wangzhaode/mnn-llm.git
. Build using platform-specific scripts (e.g., ./script/build.sh
for Linux/macOS, ./script/build.ps1
for Windows, ./script/android_build.sh
for Android).pip install mnnllm
.Highlighted Details
mnnllm
) for easier integration.Maintenance & Community
The project states it has been merged into the main MNN repository. Specific community channels or active maintenance status for this standalone repo are not detailed.
Licensing & Compatibility
The project's licensing is not explicitly stated in the README. Compatibility for commercial use or linking with closed-source applications would require clarification on the underlying MNN license and any specific licenses for the LLM models used.
Limitations & Caveats
The project is marked as merged into MNN, suggesting this repository might be deprecated or superseded. The README notes that example code was 100% generated by ChatGPT, which may imply potential quality or robustness issues.
6 months ago
Inactive