skypilot  by skypilot-org

Framework for cloud AI/batch jobs, unifying execution across diverse infrastructure

created 4 years ago
8,434 stars

Top 6.2% on sourcepulse

GitHubView on GitHub
Project Summary

SkyPilot is an open-source framework designed to simplify the execution of AI and batch workloads across diverse infrastructure, including Kubernetes clusters and over 16 cloud providers. It targets AI practitioners, researchers, and engineers by offering a unified interface for provisioning resources, managing jobs, and optimizing costs, thereby abstracting away the complexities of different cloud environments and hardware.

How It Works

SkyPilot employs a declarative approach where users define their compute requirements, data synchronization, setup commands, and execution commands in a unified format (YAML or Python API). The framework then intelligently identifies the most cost-effective and available infrastructure, provisions virtual machines, synchronizes code, executes setup scripts, and runs the user's job. This abstraction layer enables seamless portability and avoids vendor lock-in, allowing users to switch between or utilize multiple cloud providers and Kubernetes clusters with minimal effort.

Quick Start & Requirements

  • Install with pip: pip install -U "skypilot[kubernetes,aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,nebius]" (replace with desired cloud providers).
  • Nightly build: pip install "skypilot-nightly[kubernetes,aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,nebius]"
  • Documentation: Installation, Quickstart, CLI Reference.

Highlighted Details

  • Supports over 16 cloud providers and Kubernetes.
  • Features automatic resource cleanup (autostop) and spot instance support with preemption recovery for cost savings.
  • Enables intelligent scheduling to leverage the cheapest and most available infrastructure.
  • Offers a unified interface for training, serving, and deploying various AI models and applications.

Maintenance & Community

  • Project originated at UC Berkeley's Sky Computing Lab, with significant industry contributions.
  • Community channels: Slack, X/Twitter, LinkedIn.
  • Contribution guidelines available in CONTRIBUTING.

Licensing & Compatibility

  • Apache License 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is actively developed, with recent updates and new model integrations frequently added, indicating a dynamic but potentially evolving API. Users should consult the documentation for the most current list of supported providers and features.

Health Check
Last commit

14 hours ago

Responsiveness

1 day

Pull Requests (30d)
270
Issues (30d)
131
Star History
463 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.