katib  by kubeflow

Kubernetes-native AutoML project

created 7 years ago
1,612 stars

Top 26.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Katib is a Kubernetes-native AutoML toolkit designed for hyperparameter tuning, early stopping, and neural architecture search. It targets ML engineers and researchers seeking to automate model optimization within Kubernetes environments, offering framework-agnostic support and integration with various training operators.

How It Works

Katib operates by defining "Experiments" that specify the search space, algorithms, and objective metrics. It then manages "Trials," which are Kubernetes custom resources representing individual training jobs. Katib orchestrates these trials, collecting results and iteratively applying search algorithms to find optimal hyperparameters or architectures. Its Kubernetes-native design allows it to leverage the platform's scalability and resource management for distributed tuning.

Quick Start & Requirements

  • Installation: kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.17.0"
  • Python SDK: pip install -U kubeflow-katib
  • Prerequisites: Refer to official Kubeflow documentation.
  • Resources: Kubernetes cluster required.
  • Docs: Kubeflow Katib Guide

Highlighted Details

  • Supports a wide array of search algorithms including Bayesian Optimization, TPE, CMA-ES, and HyperBand.
  • Integrates with multiple ML frameworks (TensorFlow, PyTorch, XGBoost) and training operators (Kubeflow Training Operator, Argo Workflows, Tekton Pipelines).
  • Offers a Python SDK for simplified experiment creation.
  • Provides early stopping capabilities to reduce unnecessary computation.

Maintenance & Community

  • Active community with bi-weekly AutoML and Training Working Group meetings.
  • Slack channel: #kubeflow-katib.
  • Users and presentations are showcased.

Licensing & Compatibility

  • Apache License 2.0.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

Katib's effectiveness is dependent on the underlying Kubernetes infrastructure and the correct configuration of training jobs as custom resources. While framework-agnostic, users must ensure their training applications can be containerized and managed by Kubernetes.

Health Check
Last commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
14
Issues (30d)
4
Star History
44 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.