kubedl by kubedl-io

Easier and efficient deep learning on Kubernetes

Created 6 years ago

531 stars

Top 59.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Meng Zhang

Cofounder of TabbyML

Project Summary

KubeDL simplifies and optimizes deep learning workloads on Kubernetes. Targeting ML engineers and platform operators, it provides a unified controller for training and inference, enhancing efficiency through advanced scheduling and resource management. As a CNCF sandbox project, it aims to streamline the deployment and operation of ML models in cloud-native environments.

How It Works

KubeDL employs a unified controller to manage diverse deep learning workloads, including training and inference for frameworks like TensorFlow, PyTorch, and Mars. Its architecture incorporates advanced scheduling, cache-based acceleration, metadata persistence, and file synchronization to boost performance and simplify operations. The system also features automatic configuration tuning for ML model deployment and integrates with Morphling for containerized model packaging, deployment, and lineage tracking via Kubernetes CRDs.

Quick Start & Requirements

Requires a Kubernetes cluster. Specific version and prerequisites for ML frameworks are not detailed in the provided information.
Further details may be available on the official website: https://kubedl.io.

Highlighted Details

CNCF sandbox project status indicates active development within the Cloud Native Computing Foundation ecosystem.
Supports multiple ML frameworks including TensorFlow, PyTorch, and Mars within a single controller.
Features include advanced scheduling, cache acceleration, metadata persistence, and file sync.
Includes automatic ML model deployment configuration tuning and integrates Morphling for model lineage tracking.
Related research published in "Morphling: Fast, Near-Optimal Auto-Configuration for Cloud-Native Model Serving" (ACM Socc 2021).

Maintenance & Community

Community engagement channels include DingTalk (discussions/usage), GitHub Issues (bugs/features), and a dedicated email list (cncf-kubedl-maintainers@lists.cncf.io) for specific topics.
Estimated response times range from less than a day to three days depending on the channel.

Licensing & Compatibility

The specific open-source license is not mentioned in the provided text. Compatibility for commercial use or integration with closed-source systems requires license clarification.

Limitations & Caveats

As a CNCF sandbox project, KubeDL may be in an early stage of development, potentially lacking mature features or stability guarantees.
The README does not specify installation instructions, detailed system requirements, or known limitations/bugs.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days