data-on-eks  by awslabs

EKS blueprints for data and ML platform deployment

Created 3 years ago
802 stars

Top 44.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Data on Amazon EKS (DoEKS) provides optimized blueprints for deploying and scaling data platforms on Amazon Elastic Kubernetes Service (EKS). It targets users needing to run analytics, batch processing, stream processing, workflow orchestration, and data platform workloads on Kubernetes, simplifying the complexity of tool selection and configuration.

How It Works

DoEKS leverages Kubernetes operators and popular open-source data frameworks like Apache Spark, Apache Flink, Apache Kafka, and Apache Airflow. It offers opinionated, ready-to-deploy blueprints that integrate these tools with EKS, providing end-to-end logging and observability. This approach aims to streamline the deployment of complex data stacks, enabling users to build scalable and resilient data platforms with reduced operational overhead.

Quick Start & Requirements

  • Deployment blueprints are available on the DoEKS website.
  • Requires an Amazon EKS cluster. Specific prerequisites for each blueprint (e.g., Karpenter for EMR-on-EKS) are detailed in the documentation.

Highlighted Details

  • Blueprints for EMR-on-EKS with Karpenter for cost-effective autoscaling.
  • Includes self-managed Spark with YuniKorn, Flink Operator, Strimzi Kafka, and Airflow on EKS.
  • Supports Kubernetes-native workflow engines like Argo Workflows.
  • Focuses on Data Analytics, Streaming Platforms, Scheduler Workflow Platforms, and Distributed Databases/Query Engines on EKS.

Maintenance & Community

Maintained by AWS Solution Architects with community support on a best-effort basis via the GitHub Issues section. An open-source community is focused on Data Engineering, Streaming, and Analytics on Kubernetes.

Licensing & Compatibility

Licensed under the Apache 2.0 License. Compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is in active development, with a recent split into separate repositories for Data and AI/ML workloads. Users should direct AI/ML-related contributions to the new AI on EKS repository.

Health Check
Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
8
Issues (30d)
2
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Philipp Schmid Philipp Schmid(DevRel at Google DeepMind).

production-stack by vllm-project

1.0%
2k
Reference stack for production vLLM deployment on Kubernetes
Created 8 months ago
Updated 2 days ago
Feedback? Help us improve.