data-on-eks  by awslabs

EKS blueprints for data and ML platform deployment

Created 3 years ago
812 stars

Top 43.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Data on Amazon EKS (DoEKS) provides optimized blueprints for deploying and scaling data platforms on Amazon Elastic Kubernetes Service (EKS). It targets users needing to run analytics, batch processing, stream processing, workflow orchestration, and data platform workloads on Kubernetes, simplifying the complexity of tool selection and configuration.

How It Works

DoEKS leverages Kubernetes operators and popular open-source data frameworks like Apache Spark, Apache Flink, Apache Kafka, and Apache Airflow. It offers opinionated, ready-to-deploy blueprints that integrate these tools with EKS, providing end-to-end logging and observability. This approach aims to streamline the deployment of complex data stacks, enabling users to build scalable and resilient data platforms with reduced operational overhead.

Quick Start & Requirements

  • Deployment blueprints are available on the DoEKS website.
  • Requires an Amazon EKS cluster. Specific prerequisites for each blueprint (e.g., Karpenter for EMR-on-EKS) are detailed in the documentation.

Highlighted Details

  • Blueprints for EMR-on-EKS with Karpenter for cost-effective autoscaling.
  • Includes self-managed Spark with YuniKorn, Flink Operator, Strimzi Kafka, and Airflow on EKS.
  • Supports Kubernetes-native workflow engines like Argo Workflows.
  • Focuses on Data Analytics, Streaming Platforms, Scheduler Workflow Platforms, and Distributed Databases/Query Engines on EKS.

Maintenance & Community

Maintained by AWS Solution Architects with community support on a best-effort basis via the GitHub Issues section. An open-source community is focused on Data Engineering, Streaming, and Analytics on Kubernetes.

Licensing & Compatibility

Licensed under the Apache 2.0 License. Compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is in active development, with a recent split into separate repositories for Data and AI/ML workloads. Users should direct AI/ML-related contributions to the new AI on EKS repository.

Health Check
Last Commit

6 days ago

Responsiveness

1 week

Pull Requests (30d)
4
Issues (30d)
2
Star History
11 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Maxime Beauchemin Maxime Beauchemin(Author of Apache Airflow, Superset; Founder of Preset), and
3 more.

bytewax by bytewax

0.2%
2k
Python framework for stateful stream processing
Created 3 years ago
Updated 7 months ago
Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

production-stack by vllm-project

0.8%
2k
Reference stack for production vLLM deployment on Kubernetes
Created 9 months ago
Updated 4 days ago
Feedback? Help us improve.