Discover and explore top open-source AI tools and projects—updated daily.
kubeai-projectKubernetes operator for production ML model serving
Top 34.1% on SourcePulse
KubeAI is an AI Inference Operator for Kubernetes designed to simplify the deployment and scaling of machine learning models, particularly LLMs, embeddings, and speech-to-text models, in production environments. It targets Kubernetes users seeking an "it just works" solution for serving AI workloads, offering features like intelligent scaling, optimized routing, and model caching.
How It Works
KubeAI comprises a model proxy and a model operator. The proxy provides an OpenAI-compatible API and implements a novel prefix-aware load balancing strategy to optimize KV cache utilization for backend serving engines like vLLM, outperforming standard Kubernetes Services. The operator manages backend Pods, automating model downloads, volume mounting, and LoRA adapter orchestration via a custom resource definition (CRD). This architecture aims for simplicity by avoiding dependencies on external systems like Istio or Knative.
Quick Start & Requirements
helm install kubeai kubeai/kubeai --wait --timeout 10mkind or minikube is supported), Helm. Podman users may need to adjust machine memory.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project's license is not clearly stated in the README, which may pose a risk for commercial adoption or integration into closed-source projects.
2 days ago
1 day
ScalingIntelligence
b4rtaz
llm-d
predibase
ai-dynamo