Terraform plugin for managing ML workloads on cloud/K8s
Top 90.7% on sourcepulse
This Terraform provider simplifies cloud resource management for machine learning workloads, targeting data scientists and DevOps engineers. It enables cost reduction through spot instance recovery and auto-termination, while offering a unified, developer-first experience for cloud compute, abstracting away vendor-specific complexities.
How It Works
TPI leverages cloud-native scaling groups (AWS Auto Scaling Groups, Azure VM Scale Sets, GCP Managed Instance Groups, Kubernetes Jobs) to manage compute instances. It acts as a CLI tool, eliminating the need for a separate control plane. TPI handles data checkpointing and script execution, automatically recovering spot instances upon interruption and terminating resources upon task completion or failure, thereby minimizing costs and management overhead.
Quick Start & Requirements
main.tf
file defining the task, cloud, machine type, and script.terraform init
to download providers.terraform apply
to launch, upload, and run the task.terraform refresh
and terraform show
to query status.terraform destroy
to terminate resources and download outputs.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The specific license is not detailed in the README, which could impact commercial adoption. While it supports multiple cloud providers, the depth of integration and feature parity across all platforms is not explicitly stated.
7 months ago
1 day