Serving framework for online inference of PaddlePaddle models
Top 40.7% on sourcepulse
Paddle Serving is a high-performance, flexible, and easy-to-use online inference service framework built on PaddlePaddle. It targets deep learning developers and enterprises seeking industrial-grade deployment solutions for machine learning models, offering low latency and high throughput.
How It Works
Paddle Serving integrates Paddle Inference and Paddle Lite for efficient serving and edge deployment. It offers two primary frameworks: a high-performance C++ Serving backend leveraging the bRPC network framework for optimal throughput and latency, and a user-friendly Python Pipeline framework built on gRPC/gRPC-Gateway for rapid development. Both support asynchronous, DAG-based pipelines for complex model compositions, concurrent inference, dynamic batching, and multi-card/multi-stream processing.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The documentation is primarily in Simplified Chinese, with English resources available for contributions and some core concepts. While supporting a wide range of hardware, specific setup for each may require careful attention to the provided guides.
2 months ago
Inactive