Serving  by PaddlePaddle

Serving framework for online inference of PaddlePaddle models

created 6 years ago
914 stars

Top 40.7% on sourcepulse

GitHubView on GitHub
Project Summary

Paddle Serving is a high-performance, flexible, and easy-to-use online inference service framework built on PaddlePaddle. It targets deep learning developers and enterprises seeking industrial-grade deployment solutions for machine learning models, offering low latency and high throughput.

How It Works

Paddle Serving integrates Paddle Inference and Paddle Lite for efficient serving and edge deployment. It offers two primary frameworks: a high-performance C++ Serving backend leveraging the bRPC network framework for optimal throughput and latency, and a user-friendly Python Pipeline framework built on gRPC/gRPC-Gateway for rapid development. Both support asynchronous, DAG-based pipelines for complex model compositions, concurrent inference, dynamic batching, and multi-card/multi-stream processing.

Quick Start & Requirements

  • Installation: Docker is strongly recommended. Native Linux installation and source compilation are also supported.
  • Prerequisites: Docker, Kubernetes (for cluster deployment), specific hardware drivers (Nvidia, Kunlun XPU, Huawei Ascend, etc.) for heterogeneous hardware support.
  • Resources: Detailed guides for various hardware and deployment scenarios are available.
  • Links:

Highlighted Details

  • Supports RESTful, gRPC, and bRPC protocols with C++, Python, and Java SDKs.
  • Optimized with Intel MKLDNN, Nvidia TensorRT, and low-precision quantization.
  • Provides model security features including encryption, authentication, and HTTPS gateways.
  • Offers distributed deployment for large-scale sparse parameter index models.

Maintenance & Community

  • Active community with QQ groups for discussion.
  • Contribution guidelines are provided, with numerous contributors acknowledged for specific features and examples.
  • Feedback and bug reports are managed via GitHub Issues.

Licensing & Compatibility

  • License: Apache 2.0 License.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The documentation is primarily in Simplified Chinese, with English resources available for contributions and some core concepts. While supporting a wide range of hardware, specific setup for each may require careful attention to the provided guides.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 2 days ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
created 5 years ago
updated 3 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 9 hours ago
Feedback? Help us improve.