dl_inference by wuba

General deep learning inference tool for production deployment

Created 5 years ago

414 stars

Top 70.8% on SourcePulse

Project Summary

Summary: wuba/dl_inference offers a unified, production-ready solution for deploying deep learning models from TensorFlow, PyTorch, and Caffe. It simplifies inference serving, providing robust multi-node deployment with load balancing and performance boosts via TensorRT integration for engineers.

How It Works: A central gRPC access service routes requests to specialized backends: TensorFlow Serving, Seldon (PyTorch/Caffe), and Triton Inference Server (TIS) for TensorRT. Key features include dynamic weighted round-robin load balancing for node health adaptation and automated TensorFlow/PyTorch to TensorRT conversion for performance.

Quick Start & Requirements: Deployment primarily uses Docker. Prerequisites include Docker, JDK 1.8+ (for access service), and model files (SavedModel, .pth, .caffemodel). Deployment involves using provided Dockerfiles/images for TF Serving/TIS, configuring model paths, and starting services. Performance benchmarks are detailed for various models on GPU (P40) and CPU (Intel Xeon E5-2620 v4).

Highlighted Details:

Achieves low inference latency: reported as low as 3ms on CPU and 8-11ms on a P40 GPU.
Automates TensorFlow/PyTorch to TensorRT conversion for performance.
Features dynamic weighted round-robin load balancing for multi-node resilience.
Supports custom pre/post-processing and flexible model invocation for PyTorch/Caffe.

Maintenance & Community: Acknowledges Intel (MKL for TF Serving) and Nvidia (Triton). Contributions are welcomed via GitHub issues/PRs or email (ailab-opensource@58.com). Future plans include CPU performance acceleration (e.g., OpenVINO) and TIS GPU usability improvements. No explicit community channels are listed.

Licensing & Compatibility: The provided README content does not specify a software license, posing a significant blocker for compatibility evaluation.

Limitations & Caveats: TensorFlow models with custom operators require recompiling TensorFlow Serving source code. Underlying framework complexities persist despite deployment simplification. Potential issues with PyTorch-to-ONNX conversion are noted, though alternative paths are provided.

dl_inference by wuba

Explore Similar Projects

SeedVR by ByteDance-Seed

hongbomiao.com by hongbo-miao

testing-ml by eugeneyan

DiffRhythm by ASLP-lab

deep-learning-containers by aws

BIG-bench by google

list_of_recommender_systems by grahamjenson

text-to-text-transfer-transformer by google-research

manifests by kubeflow

server by triton-inference-server

nmt by tensorflow

PaddleDetection by PaddlePaddle