nndeploy  by nndeploy

Multi-platform inference deployment framework

created 2 years ago
1,109 stars

Top 35.1% on sourcepulse

GitHubView on GitHub
Project Summary

This framework simplifies high-performance, multi-platform inference deployment for AI models. It targets developers needing to deploy models across diverse hardware and operating systems, offering a unified codebase and efficient execution.

How It Works

The core approach models deployment as a Directed Acyclic Graph (DAG), where preprocessing, inference, and postprocessing are distinct nodes. This DAG structure, with its "graph-in-graph" capability, allows for modularity and efficient composition of complex multi-model pipelines. The framework emphasizes performance through various parallel execution modes (serial, pipeline, task, and combined) and resource management via thread and memory pools.

Quick Start & Requirements

  • Installation: Compilation from source is the primary method.
  • Prerequisites: Specific build requirements depend on the target platform and inference backend. Linux is the most comprehensively supported OS.
  • Resources: Compilation time and resource usage will vary.
  • Documentation: 文档

Highlighted Details

  • Supports a wide range of inference backends including TensorRT, OpenVINO, ONNXRuntime, MNN, TNN, ncnn, coreML, AscendCL, and RKNN.
  • Enables zero-copy operations between preprocessing and inference for improved end-to-end performance.
  • Offers flexible parallel execution strategies for optimizing throughput and latency.
  • Facilitates rapid demo construction with support for various input/output formats.

Maintenance & Community

The project is actively developing and welcomes contributions. A WeChat group is available for community discussion.

Licensing & Compatibility

The project's licensing is not explicitly stated in the README, which may pose a compatibility concern for commercial or closed-source integration.

Limitations & Caveats

The framework is described as being in its development stage. Some features, such as the memory pool and high-performance operators, are still under development. Support for certain models (e.g., Stable Diffusion, QWen, SAM) is also noted as "in progress."

Health Check
Last commit

12 hours ago

Responsiveness

Inactive

Pull Requests (30d)
9
Issues (30d)
17
Star History
368 stars in the last 90 days

Explore Similar Projects

Starred by Logan Kilpatrick Logan Kilpatrick(Product Lead on Google AI Studio), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

catalyst by catalyst-team

0%
3k
PyTorch framework for accelerated deep learning R&D
created 7 years ago
updated 1 month ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 13 hours ago
Feedback? Help us improve.