Discover and explore top open-source AI tools and projects—updated daily.
General deep learning inference tool for production deployment
Top 70.8% on SourcePulse
Summary:
wuba/dl_inference
offers a unified, production-ready solution for deploying deep learning models from TensorFlow, PyTorch, and Caffe. It simplifies inference serving, providing robust multi-node deployment with load balancing and performance boosts via TensorRT integration for engineers.
How It Works: A central gRPC access service routes requests to specialized backends: TensorFlow Serving, Seldon (PyTorch/Caffe), and Triton Inference Server (TIS) for TensorRT. Key features include dynamic weighted round-robin load balancing for node health adaptation and automated TensorFlow/PyTorch to TensorRT conversion for performance.
Quick Start & Requirements:
Deployment primarily uses Docker. Prerequisites include Docker, JDK 1.8+ (for access service), and model files (SavedModel, .pth
, .caffemodel
). Deployment involves using provided Dockerfiles/images for TF Serving/TIS, configuring model paths, and starting services. Performance benchmarks are detailed for various models on GPU (P40) and CPU (Intel Xeon E5-2620 v4).
Highlighted Details:
Maintenance & Community:
Acknowledges Intel (MKL for TF Serving) and Nvidia (Triton). Contributions are welcomed via GitHub issues/PRs or email (ailab-opensource@58.com
). Future plans include CPU performance acceleration (e.g., OpenVINO) and TIS GPU usability improvements. No explicit community channels are listed.
Licensing & Compatibility: The provided README content does not specify a software license, posing a significant blocker for compatibility evaluation.
Limitations & Caveats: TensorFlow models with custom operators require recompiling TensorFlow Serving source code. Underlying framework complexities persist despite deployment simplification. Potential issues with PyTorch-to-ONNX conversion are noted, though alternative paths are provided.
3 years ago
Inactive