kaito by kaito-project

Kubernetes operator for AI/ML model inference and tuning

Created 2 years ago

858 stars

Top 41.8% on SourcePulse

Project Summary

Kaito is a Kubernetes operator designed to automate the deployment and management of AI/ML model inference and tuning workloads, specifically targeting popular open-source large language models like Falcon and Phi-3. It simplifies the process of onboarding these models into Kubernetes clusters for engineers and researchers, offering an OpenAI-compatible inference server and automated GPU node provisioning.

How It Works

Kaito leverages Kubernetes Custom Resource Definitions (CRDs) and controllers. Users define a Workspace CR, specifying GPU requirements and inference configurations. The Workspace controller then automates the deployment by creating necessary Kubernetes resources (Deployments, StatefulSets, Jobs) and interacting with a Node provisioner controller (like Karpenter's gpu-provisioner) to auto-provision GPU nodes via cloud provider APIs (e.g., Azure Resource Manager). This CRD-driven approach abstracts away much of the underlying Kubernetes complexity for model deployment.

Quick Start & Requirements

Install: Deployment via Azure CLI or Terraform (links provided in README).
Prerequisites: Kubernetes cluster with GPU nodes, Azure CLI or Terraform. Integration with Karpenter for node provisioning is noted.
Demo: A quick start example demonstrates deploying a Phi-3.5-mini-instruct model using a Workspace CR.

Highlighted Details

Supports popular inference runtimes: vLLM and Hugging Face Transformers.
Automates GPU node provisioning based on model requirements.
Offers preset configurations to simplify GPU hardware parameter tuning.
Supports model fine-tuning and using fine-tuned adapters for inference (v0.3.0+).

Maintenance & Community

Latest Release: v0.4.5 (April 18th, 2025). First Release: v0.1.0 (November 15th, 2023).
Community Slack channel available.
Project welcomes contributions via a Contributor License Agreement (CLA) managed by the Linux Foundation.

Licensing & Compatibility

License: MIT License.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

Currently does not handle automatic model upgrades; manual deletion/recreation of workloads or new workspaces are recommended.
Manual override of preset configurations is limited to specific parameters via kubectl edit.
Integration with node provisioning relies on Karpenter-compatible controllers.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days