volcano  by volcano-sh

Kubernetes batch scheduler for AI/ML/DL, big data, and HPC workloads

created 6 years ago
4,853 stars

Top 10.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Volcano is a Kubernetes-native batch scheduling system designed to enhance the default kube-scheduler for AI, ML, Big Data, and HPC workloads. It offers robust integration with popular frameworks like Spark, TensorFlow, PyTorch, and MPI, providing a mature and flexible solution for managing complex, high-performance computing tasks on cloud-native infrastructure.

How It Works

Volcano extends Kubernetes scheduling by introducing a custom scheduler and CRDs for defining batch jobs. It leverages concepts from kube-batch and incorporates over fifteen years of experience in operating high-performance workloads. This approach allows for advanced features like gang scheduling, resource preemption, and topology-aware scheduling, optimizing resource utilization and job execution for demanding applications.

Quick Start & Requirements

  • Install: kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml or via Helm: helm install volcano volcano-sh/volcano -n volcano-system --create-namespace.
  • Prerequisites: Kubernetes 1.12+ with CRD support.
  • Resources: Installation via YAML creates volcano-admission, volcano-controllers, and volcano-scheduler pods in the volcano-system namespace.
  • Docs: Quick Start Guide, Helm Charts.

Highlighted Details

  • CNCF incubating project with widespread adoption across industries.
  • Integrates with Spark, Flink, Ray, TensorFlow, PyTorch, Argo, MindSpore, PaddlePaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene.
  • Supports AI, ML, Big Data, and HPC workloads.
  • Offers features like gang scheduling and GPU fragmentation prevention.

Maintenance & Community

  • Active community with hundreds of contributors.
  • Weekly/bi-weekly community meetings.
  • Contact: Slack Channel, Mailing List, WeChat.

Licensing & Compatibility

  • Apache 2.0 License.
  • Compatible with Kubernetes versions 1.17 through 1.32 (check specific version compatibility table in README for details).

Limitations & Caveats

The README notes that the one-click install script ./hack/local-up-volcano.sh is temporarily only available for x86_64 architecture. Compatibility with older Kubernetes versions (< v1.16) uses deprecated CRDs.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
72
Issues (30d)
50
Star History
252 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.