simple-data-analysis  by nshiab

JS library for data analysis

created 3 years ago
307 stars

Top 88.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This JavaScript library provides a high-performance, easy-to-use interface for data analysis, targeting developers who want to perform complex data manipulations and visualizations directly within the JavaScript ecosystem. It aims to bridge the gap between data preparation (often done in Python/R) and front-end visualization, enabling full-stack JavaScript data workflows.

How It Works

The library leverages DuckDB, an in-process analytical database, for its core data processing capabilities. It utilizes duckdb-node and duckdb-wasm to enable execution in both Node.js and browser environments. Geospatial operations are supported via the duckdb_spatial extension. The API design is inspired by popular Python (Pandas) and R (Tidyverse) libraries, offering both SQL query execution and JavaScript-based data manipulation methods.

Quick Start & Requirements

  • Installation: deno -A jsr:@nshiab/setup-sda (Deno), npx setup-sda (Node.js), bunx jsr add @nshiab/simple-data-analysis (Bun).
  • Prerequisites: Deno >= 2.x.x, Node.js >= 22.6.x.
  • Documentation: JSR

Highlighted Details

  • Demonstrates significant performance advantages with large datasets (1.7GB CSV, 1 billion rows) compared to Pandas and Tidyverse in specific benchmarks.
  • Features an intelligent caching mechanism (.sda-cache) for fetched and computed data, significantly speeding up iterative workflows.
  • Supports geospatial data analysis, including creating geometries and performing spatial joins.
  • Offers flexibility with direct SQL query execution and JavaScript data processing (updateWithJS).

Maintenance & Community

  • Maintained by Nael Shiab, a computational journalist at CBC News.
  • Community engagement encouraged via issues and conversations.

Licensing & Compatibility

  • The library is available on JSR, implying a permissive license suitable for various projects. Specific license details are not explicitly stated in the README.

Limitations & Caveats

  • While performance is highlighted, benchmarks are specific to a MacBook Pro (M1 Pro) and may vary on different hardware.
  • The library is primarily focused on analytical database operations; complex statistical modeling or machine learning algorithms are not its core focus.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
13
Issues (30d)
18
Star History
24 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.