Kubernetes operator for production ML model serving
Top 37.0% on sourcepulse
KubeAI is an AI Inference Operator for Kubernetes designed to simplify the deployment and scaling of machine learning models, particularly LLMs, embeddings, and speech-to-text models, in production environments. It targets Kubernetes users seeking an "it just works" solution for serving AI workloads, offering features like intelligent scaling, optimized routing, and model caching.
How It Works
KubeAI comprises a model proxy and a model operator. The proxy provides an OpenAI-compatible API and implements a novel prefix-aware load balancing strategy to optimize KV cache utilization for backend serving engines like vLLM, outperforming standard Kubernetes Services. The operator manages backend Pods, automating model downloads, volume mounting, and LoRA adapter orchestration via a custom resource definition (CRD). This architecture aims for simplicity by avoiding dependencies on external systems like Istio or Knative.
Quick Start & Requirements
helm install kubeai kubeai/kubeai --wait --timeout 10m
kind
or minikube
is supported), Helm. Podman users may need to adjust machine memory.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project's license is not clearly stated in the README, which may pose a risk for commercial adoption or integration into closed-source projects.
1 week ago
1 day