Serving by PaddlePaddle

Serving framework for online inference of PaddlePaddle models

Created 6 years ago

922 stars

Top 39.5% on SourcePulse

Project Summary

Paddle Serving is a high-performance, flexible, and easy-to-use online inference service framework built on PaddlePaddle. It targets deep learning developers and enterprises seeking industrial-grade deployment solutions for machine learning models, offering low latency and high throughput.

How It Works

Paddle Serving integrates Paddle Inference and Paddle Lite for efficient serving and edge deployment. It offers two primary frameworks: a high-performance C++ Serving backend leveraging the bRPC network framework for optimal throughput and latency, and a user-friendly Python Pipeline framework built on gRPC/gRPC-Gateway for rapid development. Both support asynchronous, DAG-based pipelines for complex model compositions, concurrent inference, dynamic batching, and multi-card/multi-stream processing.

Quick Start & Requirements

Installation: Docker is strongly recommended. Native Linux installation and source compilation are also supported.
Prerequisites: Docker, Kubernetes (for cluster deployment), specific hardware drivers (Nvidia, Kunlun XPU, Huawei Ascend, etc.) for heterogeneous hardware support.
Resources: Detailed guides for various hardware and deployment scenarios are available.
Links:
- Docker Installation: doc/Install_CN.md
- Quick Start: doc/Quick_Start_CN.md
- Model Zoo: doc/Model_Zoo_CN.md

Highlighted Details

Supports RESTful, gRPC, and bRPC protocols with C++, Python, and Java SDKs.
Optimized with Intel MKLDNN, Nvidia TensorRT, and low-precision quantization.
Provides model security features including encryption, authentication, and HTTPS gateways.
Offers distributed deployment for large-scale sparse parameter index models.

Maintenance & Community

Active community with QQ groups for discussion.
Contribution guidelines are provided, with numerous contributors acknowledged for specific features and examples.
Feedback and bug reports are managed via GitHub Issues.

Licensing & Compatibility

License: Apache 2.0 License.
Compatibility: Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The documentation is primarily in Simplified Chinese, with English resources available for contributions and some core concepts. While supporting a wide range of hardware, specific setup for each may require careful attention to the provided guides.

Serving by PaddlePaddle

Explore Similar Projects

dash-infer by modelscope

FastASR by chenkui164

cube-studio by data-infra

simple_tensorflow_serving by tobegit3hub

nndeploy by nndeploy

MLServer by SeldonIO

model_server by openvinotoolkit

LitServe by Lightning-AI

KuiperInfer by zjhellofss

ggml by ggml-org

serving by tensorflow

tensorflow by tensorflow