kaito  by kaito-project

Kubernetes operator for AI/ML model inference and tuning

Created 2 years ago
744 stars

Top 46.6% on SourcePulse

GitHubView on GitHub
Project Summary

Kaito is a Kubernetes operator designed to automate the deployment and management of AI/ML model inference and tuning workloads, specifically targeting popular open-source large language models like Falcon and Phi-3. It simplifies the process of onboarding these models into Kubernetes clusters for engineers and researchers, offering an OpenAI-compatible inference server and automated GPU node provisioning.

How It Works

Kaito leverages Kubernetes Custom Resource Definitions (CRDs) and controllers. Users define a Workspace CR, specifying GPU requirements and inference configurations. The Workspace controller then automates the deployment by creating necessary Kubernetes resources (Deployments, StatefulSets, Jobs) and interacting with a Node provisioner controller (like Karpenter's gpu-provisioner) to auto-provision GPU nodes via cloud provider APIs (e.g., Azure Resource Manager). This CRD-driven approach abstracts away much of the underlying Kubernetes complexity for model deployment.

Quick Start & Requirements

  • Install: Deployment via Azure CLI or Terraform (links provided in README).
  • Prerequisites: Kubernetes cluster with GPU nodes, Azure CLI or Terraform. Integration with Karpenter for node provisioning is noted.
  • Demo: A quick start example demonstrates deploying a Phi-3.5-mini-instruct model using a Workspace CR.

Highlighted Details

  • Supports popular inference runtimes: vLLM and Hugging Face Transformers.
  • Automates GPU node provisioning based on model requirements.
  • Offers preset configurations to simplify GPU hardware parameter tuning.
  • Supports model fine-tuning and using fine-tuned adapters for inference (v0.3.0+).

Maintenance & Community

  • Latest Release: v0.4.5 (April 18th, 2025). First Release: v0.1.0 (November 15th, 2023).
  • Community Slack channel available.
  • Project welcomes contributions via a Contributor License Agreement (CLA) managed by the Linux Foundation.

Licensing & Compatibility

  • License: MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Currently does not handle automatic model upgrades; manual deletion/recreation of workloads or new workspaces are recommended.
  • Manual override of preset configurations is limited to specific parameters via kubectl edit.
  • Integration with node provisioning relies on Karpenter-compatible controllers.
Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
90
Issues (30d)
24
Star History
33 stars in the last 30 days

Explore Similar Projects

Starred by Amanpreet Singh Amanpreet Singh(Cofounder of Contextual AI), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
7 more.

truss by basetenlabs

0.2%
1k
Model deployment tool for productionizing AI/ML models
Created 3 years ago
Updated 1 day ago
Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
4 more.

seldon-core by SeldonIO

0.2%
5k
MLOps framework for production model deployment on Kubernetes
Created 7 years ago
Updated 15 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.