kthena  by volcano-sh

Scalable LLM inference serving on Kubernetes

Created 1 year ago
353 stars

Top 78.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Kthena is a Kubernetes-native platform designed to simplify, scale, and optimize LLM inference in production environments. It targets engineers and platform teams seeking enterprise-grade reliability and cost-efficiency for AI infrastructure. By leveraging familiar cloud-native patterns, Kthena enables high-performance LLM deployment with advanced autoscaling and multi-backend support.

How It Works

Kthena extends Kubernetes with Custom Resource Definitions (CRDs) for declarative LLM workload management. Its architecture separates control plane operations (model lifecycle, scaling) from data plane traffic routing via an intelligent router. This design supports multiple inference engines (vLLM, SGLang, Triton) and novel patterns like prefill-decode disaggregation, optimizing hardware utilization and latency.

Quick Start & Requirements

Installation can be initiated with ./hack/local-up-kthena.sh for a local Kubernetes cluster setup. The project requires a Kubernetes environment.

Highlighted Details

  • Supports production-ready LLM serving with engines like vLLM, SGLang, Triton, and TorchServe.
  • Enables Prefill-Decode Disaggregation for optimized compute and latency.
  • Features Cost-Driven Autoscaling based on multiple metrics and budget constraints.
  • Provides Zero-Downtime Updates via rolling model updates.
  • Supports Dynamic LoRA Management for hot-swapping adapters.
  • Includes Network Topology-Aware Scheduling and Gang Scheduling for distributed inference.
  • Offers Intelligent Routing with advanced traffic policies (canary, weighted, rate limiting).

Maintenance & Community

The project welcomes contributions and community engagement through GitHub Issues and Discussions. Health is indicated by Go Check, Go Report Card, and GitHub Release badges.

Licensing & Compatibility

Kthena is licensed under the Apache 2.0 License, which is generally permissive for commercial use and integration.

Limitations & Caveats

The kthena-router component is a reference implementation and is under active iteration, as standard gateway extensions do not natively support prefill-decode distribution. It can be deployed behind a standard API gateway.

Health Check
Last Commit

11 hours ago

Responsiveness

Inactive

Pull Requests (30d)
144
Issues (30d)
82
Star History
32 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhang Eric Zhang(Founding Engineer at Modal) and Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

smg by lightseekorg

5.6%
284
High-performance LLM gateway for diverse inference backends
Created 6 months ago
Updated 11 hours ago
Starred by Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI) and Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

aiconfigurator by ai-dynamo

3.3%
312
LLM serving configuration optimization
Created 10 months ago
Updated 12 hours ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
3 more.

minions by HazyResearch

0.1%
1k
Communication protocol for cost-efficient LLM collaboration
Created 1 year ago
Updated 2 months ago
Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
3 more.

llm-d by llm-d

1.1%
3k
Kubernetes-native framework for distributed LLM inference
Created 1 year ago
Updated 21 hours ago
Feedback? Help us improve.