model_server  by openvinotoolkit

Scalable inference server for OpenVINO-optimized models

created 6 years ago
745 stars

Top 47.6% on sourcepulse

GitHubView on GitHub
Project Summary

OpenVINO™ Model Server provides a scalable inference solution for models optimized with OpenVINO™. It enables remote inference, allowing lightweight clients to interact with models deployed on edge or cloud infrastructure via REST or gRPC, abstracting away framework and hardware dependencies. This makes it ideal for microservices and cloud-native applications, offering efficient resource utilization and simplified model management.

How It Works

The server hosts OpenVINO™-optimized models, exposing them through gRPC or REST APIs, mirroring TensorFlow Serving and KServe interfaces. It supports various frameworks (TensorFlow, PaddlePaddle, ONNX) and accelerators, with a Directed Acyclic Graph (DAG) scheduler for complex pipelines and custom nodes. Models can be managed dynamically, including versioning and runtime updates, with metrics compatible with Prometheus.

Quick Start & Requirements

  • Install/Run: Docker images are available on Docker Hub.
  • Prerequisites: Tested on RedHat, Ubuntu, and Windows.
  • Resources: Quick-start guides for vision and LLM use cases are available.
  • Links: QuickStart, LLM QuickStart

Highlighted Details

  • Native Windows support.
  • Text Embeddings and Reranking compatible with OpenAI and Cohere APIs.
  • Efficient Text Generation via OpenAI API.
  • gRPC streaming, MediaPipe graphs serving, and Python code execution.

Maintenance & Community

  • Binary packages for Linux and Windows are available on GitHub.
  • Submit questions, feature requests, or bug reports via GitHub issues.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project's licensing is not clearly stated in the README, which may impact commercial adoption. Specific hardware requirements or performance benchmarks beyond general optimization claims are not detailed.

Health Check
Last commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)
79
Issues (30d)
15
Star History
24 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 2 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
2 more.

llmware by llmware-ai

0.2%
14k
Framework for enterprise RAG pipelines using small, specialized models
created 1 year ago
updated 1 week ago
Feedback? Help us improve.