terraform-provider-iterative  by iterative

Terraform plugin for managing ML workloads on cloud/K8s

created 4 years ago
295 stars

Top 90.7% on sourcepulse

GitHubView on GitHub
Project Summary

This Terraform provider simplifies cloud resource management for machine learning workloads, targeting data scientists and DevOps engineers. It enables cost reduction through spot instance recovery and auto-termination, while offering a unified, developer-first experience for cloud compute, abstracting away vendor-specific complexities.

How It Works

TPI leverages cloud-native scaling groups (AWS Auto Scaling Groups, Azure VM Scale Sets, GCP Managed Instance Groups, Kubernetes Jobs) to manage compute instances. It acts as a CLI tool, eliminating the need for a separate control plane. TPI handles data checkpointing and script execution, automatically recovering spot instances upon interruption and terminating resources upon task completion or failure, thereby minimizing costs and management overhead.

Quick Start & Requirements

  • Install Terraform 1.0+.
  • Cloud vendor authentication credentials via environment variables.
  • Create a main.tf file defining the task, cloud, machine type, and script.
  • Run terraform init to download providers.
  • Execute terraform apply to launch, upload, and run the task.
  • terraform refresh and terraform show to query status.
  • terraform destroy to terminate resources and download outputs.
  • Official docs: https://github.com/iterative/terraform-provider-iterative#readme

Highlighted Details

  • Transparent data checkpoint/restore and auto-respawning of spot/preemptible instances.
  • Unified abstraction across AWS, Azure, GCP, and Kubernetes.
  • Automatic cleanup of unused resources and termination of compute instances.
  • One-command data sync and code execution without external servers.

Maintenance & Community

  • Active Discord server for questions and feedback.
  • GitHub issues for feature requests and bug reports.
  • Future plans include more visual interfaces, distributed training support, and tighter ecosystem integration.

Licensing & Compatibility

  • Copyright and contributions distributed under an unspecified license.
  • Compatible with commercial use and closed-source linking, assuming the license permits.

Limitations & Caveats

The specific license is not detailed in the README, which could impact commercial adoption. While it supports multiple cloud providers, the depth of integration and feature parity across all platforms is not explicitly stated.

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
2 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.