Discover and explore top open-source AI tools and projects—updated daily.
kubedl-ioEasier and efficient deep learning on Kubernetes
Top 59.4% on SourcePulse
KubeDL simplifies and optimizes deep learning workloads on Kubernetes. Targeting ML engineers and platform operators, it provides a unified controller for training and inference, enhancing efficiency through advanced scheduling and resource management. As a CNCF sandbox project, it aims to streamline the deployment and operation of ML models in cloud-native environments.
How It Works
KubeDL employs a unified controller to manage diverse deep learning workloads, including training and inference for frameworks like TensorFlow, PyTorch, and Mars. Its architecture incorporates advanced scheduling, cache-based acceleration, metadata persistence, and file synchronization to boost performance and simplify operations. The system also features automatic configuration tuning for ML model deployment and integrates with Morphling for containerized model packaging, deployment, and lineage tracking via Kubernetes CRDs.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive
veekaybee
merrymercy
Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab),
google
grahamjenson
ThilinaRajapakse
google-research
triton-inference-server
tensorflow
visenger