gateway-api-inference-extension by kubernetes-sigs

Kubernetes extension for AI inference gateways

Created 1 year ago

562 stars

Top 57.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Emile Vauge

Founder of Traefik

Project Summary

This project provides tools for building AI Inference Gateways on Kubernetes, targeting platform teams self-hosting generative AI models. It enhances existing proxies like Envoy Gateway to intelligently route inference requests, optimizing for cost and performance, and simplifying the management of LLM workloads.

How It Works

The Inference Gateway (GIE) extends existing proxies with an "Endpoint Selector" component. This selector uses metrics and capabilities data from model servers (like vLLM) to make intelligent routing decisions. It prioritizes optimal endpoints based on factors such as KV-cache awareness and request cost, aiming to reduce tail latency and improve throughput. GIE also offers Kubernetes-native APIs for managing LoRA adapters, traffic splitting, and staged rollouts of models and servers.

Quick Start & Requirements

Install: Follow the Getting Started Guide.
Prerequisites: Requires a Kubernetes cluster with an ext-proc-capable proxy (e.g., Envoy Gateway, kGateway) and a compatible model server (currently vLLM with specific metrics support). GPU accelerators are implied for inference workloads.
Documentation: Official Website for detailed API documentation.

Highlighted Details

Optimizes LLM inference latency and throughput via KV-cache and cost-aware scheduling.
Enables Kubernetes-native declarative APIs for LoRA adapter management and staged rollouts.
Provides end-to-end observability for service objective attainment.
Facilitates safe multi-tenancy of foundation models on shared infrastructure.

Maintenance & Community

Active community with weekly meetings (Thursday 10 AM PDT) and a dedicated Slack channel (#wg-serving).
Contributions are welcomed via the dev guide.

Licensing & Compatibility

License: Not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This project is currently in alpha (0.3 release) and not recommended for production use. It has limited model server support, with only vLLM explicitly mentioned as currently compatible.

Health Check

Last Commit

14 hours ago

Responsiveness

Inactive

Pull Requests (30d)

106

Issues (30d)

Star History

20 stars in the last 30 days