Discover and explore top open-source AI tools and projects—updated daily.
volcano-shScalable LLM inference serving on Kubernetes
Top 88.5% on SourcePulse
Summary
Kthena is a Kubernetes-native platform designed to simplify, scale, and optimize LLM inference in production environments. It targets engineers and platform teams seeking enterprise-grade reliability and cost-efficiency for AI infrastructure. By leveraging familiar cloud-native patterns, Kthena enables high-performance LLM deployment with advanced autoscaling and multi-backend support.
How It Works
Kthena extends Kubernetes with Custom Resource Definitions (CRDs) for declarative LLM workload management. Its architecture separates control plane operations (model lifecycle, scaling) from data plane traffic routing via an intelligent router. This design supports multiple inference engines (vLLM, SGLang, Triton) and novel patterns like prefill-decode disaggregation, optimizing hardware utilization and latency.
Quick Start & Requirements
Installation can be initiated with ./hack/local-up-kthena.sh for a local Kubernetes cluster setup. The project requires a Kubernetes environment.
Highlighted Details
Maintenance & Community
The project welcomes contributions and community engagement through GitHub Issues and Discussions. Health is indicated by Go Check, Go Report Card, and GitHub Release badges.
Licensing & Compatibility
Kthena is licensed under the Apache 2.0 License, which is generally permissive for commercial use and integration.
Limitations & Caveats
The kthena-router component is a reference implementation and is under active iteration, as standard gateway extensions do not natively support prefill-decode distribution. It can be deployed behind a standard API gateway.
2 days ago
Inactive
HazyResearch
b4rtaz
llm-d
ai-dynamo