bigdata  by haifengl

Big data intro PDF/EPUB: Hadoop, Spark, NoSQL overview for architects/developers

Created 10 years ago
397 stars

Top 72.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive introduction to Big Data technologies, targeting software architects and advanced developers. It aims to demystify the core concepts and architectural patterns behind Big Data systems, enabling readers to understand their design and make informed technology choices.

How It Works

The book systematically covers the Big Data landscape, starting with foundational concepts like the "3Vs" and business use cases. It then delves into key technologies such as Apache Hadoop (HDFS, MapReduce, YARN, Tez), Apache Spark, and various NoSQL databases (HBase, Riak, Cassandra, MongoDB). The explanations focus on the "how" and "why" of system design, rather than specific tool usage, to ensure long-term relevance.

Quick Start & Requirements

This repository contains the content of a book. No installation or execution is required to access the information. The content is presented in Markdown format.

Highlighted Details

  • Detailed explanations of distributed file systems like HDFS and their architectural trade-offs.
  • In-depth analysis of processing frameworks like MapReduce and Spark, including their programming models and performance characteristics.
  • Comprehensive overview of NoSQL databases, covering their data models, storage mechanisms, consistency guarantees, and architectural patterns.

Maintenance & Community

This repository appears to be static content for a book, with no active development or community interaction indicated.

Licensing & Compatibility

The repository does not explicitly state a license. The content is presented as a book introduction.

Limitations & Caveats

As a static collection of book content, this repository does not offer executable code or interactive tools. The information reflects the state of Big Data technologies at the time of writing and may not include the latest advancements or best practices.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Mike Krieger Mike Krieger(CPO at Anthropic; Cofounder of Instagram), Patrick von Platen Patrick von Platen(Research Engineer at Mistral; Author of Hugging Face Diffusers), and
22 more.

redis by redis

0.1%
71k
Redis is a versatile data structure server, cache, and query engine
Created 16 years ago
Updated 1 day ago
Feedback? Help us improve.