dl_inference  by wuba

General deep learning inference tool for production deployment

Created 5 years ago
414 stars

Top 70.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary: wuba/dl_inference offers a unified, production-ready solution for deploying deep learning models from TensorFlow, PyTorch, and Caffe. It simplifies inference serving, providing robust multi-node deployment with load balancing and performance boosts via TensorRT integration for engineers.

How It Works: A central gRPC access service routes requests to specialized backends: TensorFlow Serving, Seldon (PyTorch/Caffe), and Triton Inference Server (TIS) for TensorRT. Key features include dynamic weighted round-robin load balancing for node health adaptation and automated TensorFlow/PyTorch to TensorRT conversion for performance.

Quick Start & Requirements: Deployment primarily uses Docker. Prerequisites include Docker, JDK 1.8+ (for access service), and model files (SavedModel, .pth, .caffemodel). Deployment involves using provided Dockerfiles/images for TF Serving/TIS, configuring model paths, and starting services. Performance benchmarks are detailed for various models on GPU (P40) and CPU (Intel Xeon E5-2620 v4).

Highlighted Details:

  • Achieves low inference latency: reported as low as 3ms on CPU and 8-11ms on a P40 GPU.
  • Automates TensorFlow/PyTorch to TensorRT conversion for performance.
  • Features dynamic weighted round-robin load balancing for multi-node resilience.
  • Supports custom pre/post-processing and flexible model invocation for PyTorch/Caffe.

Maintenance & Community: Acknowledges Intel (MKL for TF Serving) and Nvidia (Triton). Contributions are welcomed via GitHub issues/PRs or email (ailab-opensource@58.com). Future plans include CPU performance acceleration (e.g., OpenVINO) and TIS GPU usability improvements. No explicit community channels are listed.

Licensing & Compatibility: The provided README content does not specify a software license, posing a significant blocker for compatibility evaluation.

Limitations & Caveats: TensorFlow models with custom operators require recompiling TensorFlow Serving source code. Underlying framework complexities persist despite deployment simplification. Potential issues with PyTorch-to-ONNX conversion are noted, though alternative paths are provided.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.1%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 4 years ago
Updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 5 months ago
Feedback? Help us improve.