Kubernetes extension for AI inference gateways
Top 71.8% on sourcepulse
This project provides tools for building AI Inference Gateways on Kubernetes, targeting platform teams self-hosting generative AI models. It enhances existing proxies like Envoy Gateway to intelligently route inference requests, optimizing for cost and performance, and simplifying the management of LLM workloads.
How It Works
The Inference Gateway (GIE) extends existing proxies with an "Endpoint Selector" component. This selector uses metrics and capabilities data from model servers (like vLLM) to make intelligent routing decisions. It prioritizes optimal endpoints based on factors such as KV-cache awareness and request cost, aiming to reduce tail latency and improve throughput. GIE also offers Kubernetes-native APIs for managing LoRA adapters, traffic splitting, and staged rollouts of models and servers.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
This project is currently in alpha (0.3 release) and not recommended for production use. It has limited model server support, with only vLLM explicitly mentioned as currently compatible.
1 day ago
1 day