dstack by dstackai

Open-source tool for simplifying GPU allocation and AI workload orchestration

Created 4 years ago

2,005 stars

Top 21.9% on SourcePulse

View on GitHub

8 Experts Love This Project

and 4 more!

Project Summary

dstack provides an open-source platform for orchestrating AI workloads and managing GPU resources, serving as an alternative to Kubernetes and Slurm for ML teams. It simplifies the allocation and deployment of jobs, services, and development environments across diverse hardware, including NVIDIA, AMD, Google TPUs, and Intel Gaudi accelerators, on cloud and on-premise infrastructure.

How It Works

dstack operates by defining infrastructure and workload configurations in YAML files, covering environments, tasks, services, fleets, volumes, and gateways. Users apply these configurations via a CLI or API, enabling dstack to automate provisioning, job queuing, scaling, networking, and failure management across heterogeneous compute resources. This declarative approach abstracts away the complexities of distributed systems and cloud provider specifics.

Quick Start & Requirements

Installation: pip install "dstack[all]" or uv tool install "dstack[all]".
Prerequisites: Git, OpenSSH. Server requires Linux, macOS, or Windows (WSL 2). CLI is available for Linux, macOS, and Windows.
Setup: Install server, configure backends (e.g., ~/.dstack/server/config.yml), start server (dstack server), configure CLI (dstack config --url ... --project ... --token ...).
Links: Docs, Discord, Contributing.

Highlighted Details

Supports NVIDIA, AMD, Google TPU, and Intel Gaudi accelerators.
Manages cloud and on-premise clusters via SSH fleets.
Handles provisioning, job queuing, auto-scaling, networking, volumes, and failure recovery.
Offers configurations for dev environments, tasks, services, fleets, volumes, and gateways.
Recent updates include GPU blocks, proxy jumps, inactivity duration, Intel Gaudi support, Vultr integration, and AWS Capacity Reservations.

Maintenance & Community

The project is actively maintained with frequent updates. A Discord community is available for support and discussion.

Licensing & Compatibility

License: Mozilla Public License 2.0.
Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is described as an alternative to established systems like Kubernetes, implying a potentially smaller ecosystem and community support compared to more mature platforms. Specific performance benchmarks or detailed comparisons are not provided in the README.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

32 stars in the last 30 days