bigdata-ecosystem  by zenkay

JSON dataset of big data projects

created 11 years ago
577 stars

Top 56.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a curated JSON dataset of projects and papers related to the Big Data ecosystem. It serves as a comprehensive reference for researchers, engineers, and practitioners looking to understand the landscape of big data technologies, tools, and foundational research. The dataset aims to be an "incomplete-but-useful" resource, facilitating discovery and comparison within the field.

How It Works

The project maintains two primary directories: projects-data and papers-data. Each directory contains JSON files, where each file represents a single big data project or research paper. The JSON schema includes fields for name, description, abstract, category, tags, and relevant links, allowing for structured data representation and easy querying. Contributions are made by adding new JSON files to these directories.

Quick Start & Requirements

Highlighted Details

  • Extensive categorization of Big Data technologies, including frameworks, distributed filesystems, databases (key-value, document, graph, columnar, time-series), SQL-like processing, machine learning, benchmarking, and system deployment tools.
  • Includes a curated list of influential research papers, categorized by publication year, providing historical context and foundational knowledge.
  • Actively encourages community contributions to expand and refine the dataset.

Maintenance & Community

The project appears to be maintained by a single contributor, zenkay. There are no explicit links to community channels like Discord or Slack, nor a public roadmap.

Licensing & Compatibility

  • License: Creative Commons Attribution-ShareAlike 4.0 International License.
  • Compatibility: This license allows for broad use, modification, and distribution, including for commercial purposes, provided attribution is given and any derivative works are shared under the same license.

Limitations & Caveats

The dataset is explicitly described as "incomplete-but-useful," meaning it may not cover every project or paper in the vast Big Data landscape. The project's maintenance status and community engagement are not clearly indicated, which could impact future updates.

Health Check
Last commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.