djl-serving  by deepjavalibrary

Scalable ML model serving via HTTP endpoints

Created 4 years ago
254 stars

Top 99.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DJL Serving is a high-performance, universal model deployment solution designed to make machine learning models accessible via a scalable HTTP endpoint. It targets engineers and researchers seeking an efficient way to serve diverse model types, offering benefits such as simplified deployment, automatic scaling, and high throughput within a single Java Virtual Machine (JVM).

How It Works

DJL Serving operates by running inference across multiple threads within a single JVM, aiming for superior throughput compared to many C++-based model servers. It natively supports popular formats like PyTorch TorchScript, TensorFlow SavedModel, ONNX, and Python scripts, with extensibility for others like XGBoost and LightGBM via plugins. The system automatically scales worker threads based on load, supports dynamic batching to enhance throughput, and allows serving multiple model versions or models from different engines concurrently on a single endpoint.

Quick Start & Requirements

  • macOS: Install via Homebrew: brew install djl-serving. Start/stop services with brew services start djl-serving / brew services stop djl-serving.
  • Ubuntu: Download .deb package: curl -O https://publish.djl.ai/djl-serving/djl-serving_0.30.0-1_all.deb, then install: sudo dpkg -i djl-serving_0.30.0-1_all.deb.
  • Windows: Download zip from https://publish.djl.ai/djl-serving/serving-0.30.0.zip, unzip, and run serving-0.30.0\bin\serving.bat. A Chocolatey package is under consideration.
  • Docker: Run using docker run -itd -p 8080:8080 deepjavalibrary/djl-serving.
  • Prerequisites: OS-specific package managers or download links.
  • Documentation: Command-line help available via djl-serving --help. Links for configuration, architecture, and plugin management are referenced but not provided in the README.

Highlighted Details

  • Performance: Claims higher throughput than most C++ model servers based on internal benchmarks.
  • Ease of Use: Capable of serving most model types out-of-the-box.
  • Extensibility: Supports custom extensions through a plugin system.
  • Auto-scaling: Dynamically adjusts worker threads based on inference load.
  • Dynamic Batching: Aggregates inference requests to improve throughput.
  • Model Versioning: Allows loading and managing different model versions on the same endpoint.
  • Multi-Engine Support: Facilitates serving models from various deep learning frameworks simultaneously.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), sponsorships, or roadmaps are present in the provided README.

Licensing & Compatibility

The license type and any compatibility notes for commercial or closed-source use are not specified in the provided README.

Limitations & Caveats

The Windows installation currently relies on manual zip file extraction, with official package support pending. By default, DJL Serving listens on port 8080 but is only accessible from localhost, requiring configuration changes for remote access.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
15
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
20 more.

alpa by alpa-projects

0.0%
3k
Auto-parallelization framework for large-scale neural network training and serving
Created 5 years ago
Updated 2 years ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
11 more.

petals by bigscience-workshop

0.1%
10k
Run LLMs at home, BitTorrent-style
Created 4 years ago
Updated 1 year ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.0%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 9 months ago
Feedback? Help us improve.