volcano by volcano-sh

Kubernetes batch scheduler for AI/ML/DL, big data, and HPC workloads

Created 7 years ago

5,340 stars

Top 9.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Luis Capelo

Cofounder of Lightning AI

Travis Addair

Cofounder of Predibase

Project Summary

Volcano is a Kubernetes-native batch scheduling system designed to enhance the default kube-scheduler for AI, ML, Big Data, and HPC workloads. It offers robust integration with popular frameworks like Spark, TensorFlow, PyTorch, and MPI, providing a mature and flexible solution for managing complex, high-performance computing tasks on cloud-native infrastructure.

How It Works

Volcano extends Kubernetes scheduling by introducing a custom scheduler and CRDs for defining batch jobs. It leverages concepts from kube-batch and incorporates over fifteen years of experience in operating high-performance workloads. This approach allows for advanced features like gang scheduling, resource preemption, and topology-aware scheduling, optimizing resource utilization and job execution for demanding applications.

Quick Start & Requirements

Install: kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml or via Helm: helm install volcano volcano-sh/volcano -n volcano-system --create-namespace.
Prerequisites: Kubernetes 1.12+ with CRD support.
Resources: Installation via YAML creates volcano-admission, volcano-controllers, and volcano-scheduler pods in the volcano-system namespace.
Docs: Quick Start Guide, Helm Charts.

Highlighted Details

CNCF incubating project with widespread adoption across industries.
Integrates with Spark, Flink, Ray, TensorFlow, PyTorch, Argo, MindSpore, PaddlePaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene.
Supports AI, ML, Big Data, and HPC workloads.
Offers features like gang scheduling and GPU fragmentation prevention.

Maintenance & Community

Active community with hundreds of contributors.
Weekly/bi-weekly community meetings.
Contact: Slack Channel, Mailing List, WeChat.

Licensing & Compatibility

Apache 2.0 License.
Compatible with Kubernetes versions 1.17 through 1.32 (check specific version compatibility table in README for details).

Limitations & Caveats

The README notes that the one-click install script ./hack/local-up-volcano.sh is temporarily only available for x86_64 architecture. Compatibility with older Kubernetes versions (< v1.16) uses deprecated CRDs.

Health Check

Last Commit

23 hours ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

81 stars in the last 30 days