kthena  by volcano-sh

Scalable LLM inference serving on Kubernetes

Created 11 months ago
301 stars

Top 88.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Kthena is a Kubernetes-native platform designed to simplify, scale, and optimize LLM inference in production environments. It targets engineers and platform teams seeking enterprise-grade reliability and cost-efficiency for AI infrastructure. By leveraging familiar cloud-native patterns, Kthena enables high-performance LLM deployment with advanced autoscaling and multi-backend support.

How It Works

Kthena extends Kubernetes with Custom Resource Definitions (CRDs) for declarative LLM workload management. Its architecture separates control plane operations (model lifecycle, scaling) from data plane traffic routing via an intelligent router. This design supports multiple inference engines (vLLM, SGLang, Triton) and novel patterns like prefill-decode disaggregation, optimizing hardware utilization and latency.

Quick Start & Requirements

Installation can be initiated with ./hack/local-up-kthena.sh for a local Kubernetes cluster setup. The project requires a Kubernetes environment.

Highlighted Details

  • Supports production-ready LLM serving with engines like vLLM, SGLang, Triton, and TorchServe.
  • Enables Prefill-Decode Disaggregation for optimized compute and latency.
  • Features Cost-Driven Autoscaling based on multiple metrics and budget constraints.
  • Provides Zero-Downtime Updates via rolling model updates.
  • Supports Dynamic LoRA Management for hot-swapping adapters.
  • Includes Network Topology-Aware Scheduling and Gang Scheduling for distributed inference.
  • Offers Intelligent Routing with advanced traffic policies (canary, weighted, rate limiting).

Maintenance & Community

The project welcomes contributions and community engagement through GitHub Issues and Discussions. Health is indicated by Go Check, Go Report Card, and GitHub Release badges.

Licensing & Compatibility

Kthena is licensed under the Apache 2.0 License, which is generally permissive for commercial use and integration.

Limitations & Caveats

The kthena-router component is a reference implementation and is under active iteration, as standard gateway extensions do not natively support prefill-decode distribution. It can be deployed behind a standard API gateway.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
48
Issues (30d)
33
Star History
50 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
3 more.

minions by HazyResearch

0.1%
1k
Communication protocol for cost-efficient LLM collaboration
Created 1 year ago
Updated 1 month ago
Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
3 more.

llm-d by llm-d

1.7%
3k
Kubernetes-native framework for distributed LLM inference
Created 11 months ago
Updated 1 day ago
Feedback? Help us improve.