data-on-eks by awslabs

EKS blueprints for data and ML platform deployment

Created 3 years ago

852 stars

Top 41.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Ed Huang

Cofounder of PingCAP

Project Summary

Data on Amazon EKS (DoEKS) provides optimized blueprints for deploying and scaling data platforms on Amazon Elastic Kubernetes Service (EKS). It targets users needing to run analytics, batch processing, stream processing, workflow orchestration, and data platform workloads on Kubernetes, simplifying the complexity of tool selection and configuration.

How It Works

DoEKS leverages Kubernetes operators and popular open-source data frameworks like Apache Spark, Apache Flink, Apache Kafka, and Apache Airflow. It offers opinionated, ready-to-deploy blueprints that integrate these tools with EKS, providing end-to-end logging and observability. This approach aims to streamline the deployment of complex data stacks, enabling users to build scalable and resilient data platforms with reduced operational overhead.

Quick Start & Requirements

Deployment blueprints are available on the DoEKS website.
Requires an Amazon EKS cluster. Specific prerequisites for each blueprint (e.g., Karpenter for EMR-on-EKS) are detailed in the documentation.

Highlighted Details

Blueprints for EMR-on-EKS with Karpenter for cost-effective autoscaling.
Includes self-managed Spark with YuniKorn, Flink Operator, Strimzi Kafka, and Airflow on EKS.
Supports Kubernetes-native workflow engines like Argo Workflows.
Focuses on Data Analytics, Streaming Platforms, Scheduler Workflow Platforms, and Distributed Databases/Query Engines on EKS.

Maintenance & Community

Maintained by AWS Solution Architects with community support on a best-effort basis via the GitHub Issues section. An open-source community is focused on Data Engineering, Streaming, and Analytics on Kubernetes.

Licensing & Compatibility

Licensed under the Apache 2.0 License. Compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is in active development, with a recent split into separate repositories for Data and AI/ML workloads. Users should direct AI/ML-related contributions to the new AI on EKS repository.

Health Check

Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days