End-to-end data analytics demo on Google Cloud, pre-configured and ready to run
Top 98.4% on sourcepulse
This repository provides an end-to-end demonstration of Google Cloud's data analytics stack, targeting engineers and data professionals who need to understand and showcase the integration of various services. It offers a fully configured environment with 70-700 million rows of data to illustrate real-world performance and scalability, enabling users to explore different data processing and analysis paths.
How It Works
The system orchestrates data pipelines using Airflow, ensuring services communicate securely over private IP addresses. It leverages a comprehensive suite of Google Cloud services, including BigQuery for analytics, Dataplex for data lake management, BQML for machine learning, and BigLake for accessing data across clouds (AWS, Azure). Recent updates include migrating from text-bison to Gemini Pro and replacing App Engine with Cloud Run.
Quick Start & Requirements
deploy.sh
or deploy-use-existing-project-non-org-admin.sh
).Highlighted Details
Maintenance & Community
This is an official Google Cloud Platform repository, indicating active maintenance and support from Google. Further demos for related technologies like Chocolate AI and Data Beans are linked.
Licensing & Compatibility
The repository is released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
Deployment requires significant Google Cloud administrative privileges and can be complex due to organization policy configurations. Some features, like BigSearch on 50 billion rows, are noted as internal. Cloud Shell deployment is noted as potentially problematic.
3 weeks ago
Inactive