kubeai  by substratusai

Kubernetes operator for production ML model serving

created 1 year ago
1,031 stars

Top 37.0% on sourcepulse

GitHubView on GitHub
Project Summary

KubeAI is an AI Inference Operator for Kubernetes designed to simplify the deployment and scaling of machine learning models, particularly LLMs, embeddings, and speech-to-text models, in production environments. It targets Kubernetes users seeking an "it just works" solution for serving AI workloads, offering features like intelligent scaling, optimized routing, and model caching.

How It Works

KubeAI comprises a model proxy and a model operator. The proxy provides an OpenAI-compatible API and implements a novel prefix-aware load balancing strategy to optimize KV cache utilization for backend serving engines like vLLM, outperforming standard Kubernetes Services. The operator manages backend Pods, automating model downloads, volume mounting, and LoRA adapter orchestration via a custom resource definition (CRD). This architecture aims for simplicity by avoiding dependencies on external systems like Istio or Knative.

Quick Start & Requirements

  • Install: helm install kubeai kubeai/kubeai --wait --timeout 10m
  • Prerequisites: Kubernetes cluster (local with kind or minikube is supported), Helm. Podman users may need to adjust machine memory.
  • Models: Deploy predefined models using a YAML configuration and Helm.
  • Docs: kubeai.org

Highlighted Details

  • Supports LLM inferencing (vLLM, Ollama), speech processing (FasterWhisper), and vector embeddings (Infinity).
  • Features intelligent scale-from-zero, optimized routing for improved TTFT and throughput, automated model caching, and dynamic LoRA adapter orchestration.
  • Offers OpenAI API compatibility for seamless integration with existing client libraries.
  • Zero dependencies on Istio, Knative, or Prometheus metrics adapter.

Maintenance & Community

  • Known adopters include Telescope, Google Cloud Distributed Edge, Lambda, Vultr, Arcee, and Seeweb.
  • Community discussion available on Discord. Contact information for maintainers Nick Stogner and Sam Stoelinga is provided.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project's license is not clearly stated in the README, which may pose a risk for commercial adoption or integration into closed-source projects.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
15
Issues (30d)
2
Star History
108 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.