data-on-eks  by awslabs

EKS blueprints for data and ML platform deployment

created 2 years ago
779 stars

Top 45.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Data on Amazon EKS (DoEKS) provides optimized blueprints for deploying and scaling data platforms on Amazon Elastic Kubernetes Service (EKS). It targets users needing to run analytics, batch processing, stream processing, workflow orchestration, and data platform workloads on Kubernetes, simplifying the complexity of tool selection and configuration.

How It Works

DoEKS leverages Kubernetes operators and popular open-source data frameworks like Apache Spark, Apache Flink, Apache Kafka, and Apache Airflow. It offers opinionated, ready-to-deploy blueprints that integrate these tools with EKS, providing end-to-end logging and observability. This approach aims to streamline the deployment of complex data stacks, enabling users to build scalable and resilient data platforms with reduced operational overhead.

Quick Start & Requirements

  • Deployment blueprints are available on the DoEKS website.
  • Requires an Amazon EKS cluster. Specific prerequisites for each blueprint (e.g., Karpenter for EMR-on-EKS) are detailed in the documentation.

Highlighted Details

  • Blueprints for EMR-on-EKS with Karpenter for cost-effective autoscaling.
  • Includes self-managed Spark with YuniKorn, Flink Operator, Strimzi Kafka, and Airflow on EKS.
  • Supports Kubernetes-native workflow engines like Argo Workflows.
  • Focuses on Data Analytics, Streaming Platforms, Scheduler Workflow Platforms, and Distributed Databases/Query Engines on EKS.

Maintenance & Community

Maintained by AWS Solution Architects with community support on a best-effort basis via the GitHub Issues section. An open-source community is focused on Data Engineering, Streaming, and Analytics on Kubernetes.

Licensing & Compatibility

Licensed under the Apache 2.0 License. Compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is in active development, with a recent split into separate repositories for Data and AI/ML workloads. Users should direct AI/ML-related contributions to the new AI on EKS repository.

Health Check
Last commit

2 days ago

Responsiveness

1 week

Pull Requests (30d)
10
Issues (30d)
6
Star History
38 stars in the last 90 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

seldon-core by SeldonIO

0.1%
5k
MLOps framework for production model deployment on Kubernetes
created 7 years ago
updated 1 day ago
Feedback? Help us improve.