pygwalker by Kanaries

Interactive UI for Pandas dataframes in Jupyter

Created 2 years ago

15,560 stars

Top 3.2% on SourcePulse

View on GitHub

10 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Dominik Moritz

Research Scientist at Apple; Professor at CMU

Tomas Valenta

Cofounder of E2B

Jonathan Ragan-Kelley

Professor at MIT

and 6 more!

Project Summary

PyGWalker is a Python library designed to simplify exploratory data analysis and visualization within Jupyter environments. It transforms pandas DataFrames into interactive user interfaces, enabling users to perform drag-and-drop visual analysis, data cleaning, and even natural language queries, effectively serving as an open-source alternative to Tableau for data scientists.

How It Works

PyGWalker integrates the Jupyter Notebook environment with Graphic Walker, a powerful visualization engine. It leverages a Python binding to expose Graphic Walker's capabilities, allowing users to interactively manipulate data visualizations directly from their notebooks. The library supports kernel_computation using DuckDB for handling larger datasets locally, enhancing performance for extensive data exploration.

Quick Start & Requirements

Install via pip: pip install pygwalker
Dependencies: pandas. Optional: DuckDB for kernel_computation.
Tested environments include Jupyter Notebook, Google Colab, Kaggle, Streamlit, and more.
Official demos and tutorials are available via provided links.

Highlighted Details

Interactive UI for drag-and-drop visual analysis.
Supports natural language queries for data exploration.
kernel_computation option enables DuckDB for larger datasets (up to 100GB).
Integrates seamlessly with Streamlit for web application deployment.
Chart configurations can be saved and loaded via JSON files.

Maintenance & Community

Active development with regular releases.
Community support available via Discord and GitHub issues.
Links to tutorials, demos, and related projects (GWalkR, RATH) are provided.

Licensing & Compatibility

Licensed under the Apache License 2.0.
Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The spec parameter for saving chart configurations requires manual saving via the UI; autosave is planned for future releases.
While kernel_computation supports larger datasets, performance may vary based on hardware and data complexity.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

103 stars in the last 30 days