kaito  by kaito-project

Kubernetes operator for AI/ML model inference and tuning

created 1 year ago
687 stars

Top 50.4% on sourcepulse

GitHubView on GitHub
Project Summary

Kaito is a Kubernetes operator designed to automate the deployment and management of AI/ML model inference and tuning workloads, specifically targeting popular open-source large language models like Falcon and Phi-3. It simplifies the process of onboarding these models into Kubernetes clusters for engineers and researchers, offering an OpenAI-compatible inference server and automated GPU node provisioning.

How It Works

Kaito leverages Kubernetes Custom Resource Definitions (CRDs) and controllers. Users define a Workspace CR, specifying GPU requirements and inference configurations. The Workspace controller then automates the deployment by creating necessary Kubernetes resources (Deployments, StatefulSets, Jobs) and interacting with a Node provisioner controller (like Karpenter's gpu-provisioner) to auto-provision GPU nodes via cloud provider APIs (e.g., Azure Resource Manager). This CRD-driven approach abstracts away much of the underlying Kubernetes complexity for model deployment.

Quick Start & Requirements

  • Install: Deployment via Azure CLI or Terraform (links provided in README).
  • Prerequisites: Kubernetes cluster with GPU nodes, Azure CLI or Terraform. Integration with Karpenter for node provisioning is noted.
  • Demo: A quick start example demonstrates deploying a Phi-3.5-mini-instruct model using a Workspace CR.

Highlighted Details

  • Supports popular inference runtimes: vLLM and Hugging Face Transformers.
  • Automates GPU node provisioning based on model requirements.
  • Offers preset configurations to simplify GPU hardware parameter tuning.
  • Supports model fine-tuning and using fine-tuned adapters for inference (v0.3.0+).

Maintenance & Community

  • Latest Release: v0.4.5 (April 18th, 2025). First Release: v0.1.0 (November 15th, 2023).
  • Community Slack channel available.
  • Project welcomes contributions via a Contributor License Agreement (CLA) managed by the Linux Foundation.

Licensing & Compatibility

  • License: MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Currently does not handle automatic model upgrades; manual deletion/recreation of workloads or new workspaces are recommended.
  • Manual override of preset configurations is limited to specific parameters via kubectl edit.
  • Integration with node provisioning relies on Karpenter-compatible controllers.
Health Check
Last commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
83
Issues (30d)
52
Star History
107 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.