volcano  by volcano-sh

Kubernetes batch scheduler for AI/ML/DL, big data, and HPC workloads

Created 6 years ago
4,958 stars

Top 10.1% on SourcePulse

GitHubView on GitHub
Project Summary

Volcano is a Kubernetes-native batch scheduling system designed to enhance the default kube-scheduler for AI, ML, Big Data, and HPC workloads. It offers robust integration with popular frameworks like Spark, TensorFlow, PyTorch, and MPI, providing a mature and flexible solution for managing complex, high-performance computing tasks on cloud-native infrastructure.

How It Works

Volcano extends Kubernetes scheduling by introducing a custom scheduler and CRDs for defining batch jobs. It leverages concepts from kube-batch and incorporates over fifteen years of experience in operating high-performance workloads. This approach allows for advanced features like gang scheduling, resource preemption, and topology-aware scheduling, optimizing resource utilization and job execution for demanding applications.

Quick Start & Requirements

  • Install: kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml or via Helm: helm install volcano volcano-sh/volcano -n volcano-system --create-namespace.
  • Prerequisites: Kubernetes 1.12+ with CRD support.
  • Resources: Installation via YAML creates volcano-admission, volcano-controllers, and volcano-scheduler pods in the volcano-system namespace.
  • Docs: Quick Start Guide, Helm Charts.

Highlighted Details

  • CNCF incubating project with widespread adoption across industries.
  • Integrates with Spark, Flink, Ray, TensorFlow, PyTorch, Argo, MindSpore, PaddlePaddle, Kubeflow, MPI, Horovod, MXNet, KubeGene.
  • Supports AI, ML, Big Data, and HPC workloads.
  • Offers features like gang scheduling and GPU fragmentation prevention.

Maintenance & Community

  • Active community with hundreds of contributors.
  • Weekly/bi-weekly community meetings.
  • Contact: Slack Channel, Mailing List, WeChat.

Licensing & Compatibility

  • Apache 2.0 License.
  • Compatible with Kubernetes versions 1.17 through 1.32 (check specific version compatibility table in README for details).

Limitations & Caveats

The README notes that the one-click install script ./hack/local-up-volcano.sh is temporarily only available for x86_64 architecture. Compatibility with older Kubernetes versions (< v1.16) uses deprecated CRDs.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
60
Issues (30d)
22
Star History
77 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.3%
4k
AI inference pipeline framework
Created 1 year ago
Updated 2 days ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.