kthena by volcano-sh

Scalable LLM inference serving on Kubernetes

Created 1 year ago

392 stars

Top 73.0% on SourcePulse

Project Summary

Summary

Kthena is a Kubernetes-native platform designed to simplify, scale, and optimize LLM inference in production environments. It targets engineers and platform teams seeking enterprise-grade reliability and cost-efficiency for AI infrastructure. By leveraging familiar cloud-native patterns, Kthena enables high-performance LLM deployment with advanced autoscaling and multi-backend support.

How It Works

Kthena extends Kubernetes with Custom Resource Definitions (CRDs) for declarative LLM workload management. Its architecture separates control plane operations (model lifecycle, scaling) from data plane traffic routing via an intelligent router. This design supports multiple inference engines (vLLM, SGLang, Triton) and novel patterns like prefill-decode disaggregation, optimizing hardware utilization and latency.

Quick Start & Requirements

Installation can be initiated with ./hack/local-up-kthena.sh for a local Kubernetes cluster setup. The project requires a Kubernetes environment.

Highlighted Details

Supports production-ready LLM serving with engines like vLLM, SGLang, Triton, and TorchServe.
Enables Prefill-Decode Disaggregation for optimized compute and latency.
Features Cost-Driven Autoscaling based on multiple metrics and budget constraints.
Provides Zero-Downtime Updates via rolling model updates.
Supports Dynamic LoRA Management for hot-swapping adapters.
Includes Network Topology-Aware Scheduling and Gang Scheduling for distributed inference.
Offers Intelligent Routing with advanced traffic policies (canary, weighted, rate limiting).

Maintenance & Community

The project welcomes contributions and community engagement through GitHub Issues and Discussions. Health is indicated by Go Check, Go Report Card, and GitHub Release badges.

Licensing & Compatibility

Kthena is licensed under the Apache 2.0 License, which is generally permissive for commercial use and integration.

Limitations & Caveats

The kthena-router component is a reference implementation and is under active iteration, as standard gateway extensions do not natively support prefill-decode distribution. It can be deployed behind a standard API gateway.

kthena by volcano-sh

Explore Similar Projects

openmodelz by tensorchord

llmaz by InftyAI

ServerlessLLM by ServerlessLLM

router by vllm-project

smg by lightseekorg

aiconfigurator by ai-dynamo

parallax by GradientHQ

minions by HazyResearch

mesh-llm by Mesh-LLM

llm-d by llm-d

aibrix by vllm-project

dynamo by ai-dynamo