kuberay  by ray-project

Kubernetes operator for Ray application deployment

created 4 years ago
1,924 stars

Top 23.1% on sourcepulse

GitHubView on GitHub
Project Summary

KubeRay is an open-source Kubernetes operator designed to simplify the deployment and management of Ray applications on Kubernetes. It targets ML engineers and data scientists who need to scale distributed AI workloads, offering robust lifecycle management, autoscaling, and fault tolerance for Ray clusters.

How It Works

KubeRay leverages Kubernetes Custom Resource Definitions (CRDs) including RayCluster, RayJob, and RayService. RayCluster manages the full lifecycle of Ray clusters, enabling autoscaling and fault tolerance. RayJob automates the creation of a Ray cluster and job submission, with optional cluster cleanup. RayService combines a RayCluster with a Ray Serve deployment graph, facilitating zero-downtime upgrades and high availability for model serving.

Quick Start & Requirements

  • Installation is typically done via kubectl apply -f <manifest-file>.
  • Requires a Kubernetes cluster.
  • Official documentation and examples are hosted on the Ray documentation site.

Highlighted Details

  • Offers a kubectl ray plugin (Beta) for simplified workflows.
  • Includes an experimental KubeRay Dashboard for resource management.
  • Integrates with observability tools (Prometheus, Grafana), queuing systems (Volcano, Kueue), and ingress controllers.
  • Numerous case studies from companies like Google, Spotify, and Airbnb highlight its use in scaling ML platforms.

Maintenance & Community

  • Actively maintained by the Ray Project.
  • Community support is available via the Ray Slack workspace in the #kuberay-questions channel. Bi-weekly community meetings are held.

Licensing & Compatibility

  • Licensed under the Apache-2.0 License.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

  • The KubeRay APIServer is in Alpha, and the KubeRay Dashboard is Experimental and not yet production-ready.
Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
39
Issues (30d)
32
Star History
215 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.