Kubernetes API for deploying pods as a unit of replication
Top 60.9% on sourcepulse
The LeaderWorkerSet (LWS) API provides a Kubernetes-native solution for deploying replicated groups of pods, specifically targeting AI/ML inference workloads like sharded LLMs across multiple nodes. It allows users to define a "super pod" composed of a leader and multiple workers, managed as a single unit for scaling, rolling updates, and topology-aware placement, simplifying complex distributed deployments.
How It Works
LWS introduces a custom resource that defines a group of pods, comprising one leader and a configurable number of workers. This group is treated as an atomic unit for lifecycle management. It supports dual pod templates (one for the leader, one for workers) and enables parallel creation of pods within a group. The API facilitates topology-aware placement, ensuring pods within a group can be co-located, and offers an "all-or-nothing" restart policy for group-level failure handling.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is presented as an API and requires a Kubernetes environment for deployment and operation. Specific performance characteristics or resource requirements for AI/ML workloads are not detailed in the README.
2 days ago
1 day