gateway-api-inference-extension  by kubernetes-sigs

Kubernetes extension for AI inference gateways

created 11 months ago
414 stars

Top 71.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides tools for building AI Inference Gateways on Kubernetes, targeting platform teams self-hosting generative AI models. It enhances existing proxies like Envoy Gateway to intelligently route inference requests, optimizing for cost and performance, and simplifying the management of LLM workloads.

How It Works

The Inference Gateway (GIE) extends existing proxies with an "Endpoint Selector" component. This selector uses metrics and capabilities data from model servers (like vLLM) to make intelligent routing decisions. It prioritizes optimal endpoints based on factors such as KV-cache awareness and request cost, aiming to reduce tail latency and improve throughput. GIE also offers Kubernetes-native APIs for managing LoRA adapters, traffic splitting, and staged rollouts of models and servers.

Quick Start & Requirements

  • Install: Follow the Getting Started Guide.
  • Prerequisites: Requires a Kubernetes cluster with an ext-proc-capable proxy (e.g., Envoy Gateway, kGateway) and a compatible model server (currently vLLM with specific metrics support). GPU accelerators are implied for inference workloads.
  • Documentation: Official Website for detailed API documentation.

Highlighted Details

  • Optimizes LLM inference latency and throughput via KV-cache and cost-aware scheduling.
  • Enables Kubernetes-native declarative APIs for LoRA adapter management and staged rollouts.
  • Provides end-to-end observability for service objective attainment.
  • Facilitates safe multi-tenancy of foundation models on shared infrastructure.

Maintenance & Community

  • Active community with weekly meetings (Thursday 10 AM PDT) and a dedicated Slack channel (#wg-serving).
  • Contributions are welcomed via the dev guide.

Licensing & Compatibility

  • License: Not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This project is currently in alpha (0.3 release) and not recommended for production use. It has limited model server support, with only vLLM explicitly mentioned as currently compatible.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
129
Issues (30d)
90
Star History
153 stars in the last 90 days

Explore Similar Projects

Starred by Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

dynamo by ai-dynamo

1.1%
5k
Inference framework for distributed generative AI model serving
created 5 months ago
updated 19 hours ago
Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.