model_server  by openvinotoolkit

Scalable inference server for OpenVINO-optimized models

Created 7 years ago
761 stars

Top 45.7% on SourcePulse

GitHubView on GitHub
Project Summary

OpenVINO™ Model Server provides a scalable inference solution for models optimized with OpenVINO™. It enables remote inference, allowing lightweight clients to interact with models deployed on edge or cloud infrastructure via REST or gRPC, abstracting away framework and hardware dependencies. This makes it ideal for microservices and cloud-native applications, offering efficient resource utilization and simplified model management.

How It Works

The server hosts OpenVINO™-optimized models, exposing them through gRPC or REST APIs, mirroring TensorFlow Serving and KServe interfaces. It supports various frameworks (TensorFlow, PaddlePaddle, ONNX) and accelerators, with a Directed Acyclic Graph (DAG) scheduler for complex pipelines and custom nodes. Models can be managed dynamically, including versioning and runtime updates, with metrics compatible with Prometheus.

Quick Start & Requirements

  • Install/Run: Docker images are available on Docker Hub.
  • Prerequisites: Tested on RedHat, Ubuntu, and Windows.
  • Resources: Quick-start guides for vision and LLM use cases are available.
  • Links: QuickStart, LLM QuickStart

Highlighted Details

  • Native Windows support.
  • Text Embeddings and Reranking compatible with OpenAI and Cohere APIs.
  • Efficient Text Generation via OpenAI API.
  • gRPC streaming, MediaPipe graphs serving, and Python code execution.

Maintenance & Community

  • Binary packages for Linux and Windows are available on GitHub.
  • Submit questions, feature requests, or bug reports via GitHub issues.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project's licensing is not clearly stated in the README, which may impact commercial adoption. Specific hardware requirements or performance benchmarks beyond general optimization claims are not detailed.

Health Check
Last Commit

15 hours ago

Responsiveness

1 week

Pull Requests (30d)
93
Issues (30d)
7
Star History
14 stars in the last 30 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
4 more.

seldon-core by SeldonIO

0.2%
5k
MLOps framework for production model deployment on Kubernetes
Created 7 years ago
Updated 14 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.