pygwalker  by Kanaries

Interactive UI for Pandas dataframes in Jupyter

Created 2 years ago
15,196 stars

Top 3.2% on SourcePulse

GitHubView on GitHub
Project Summary

PyGWalker is a Python library designed to simplify exploratory data analysis and visualization within Jupyter environments. It transforms pandas DataFrames into interactive user interfaces, enabling users to perform drag-and-drop visual analysis, data cleaning, and even natural language queries, effectively serving as an open-source alternative to Tableau for data scientists.

How It Works

PyGWalker integrates the Jupyter Notebook environment with Graphic Walker, a powerful visualization engine. It leverages a Python binding to expose Graphic Walker's capabilities, allowing users to interactively manipulate data visualizations directly from their notebooks. The library supports kernel_computation using DuckDB for handling larger datasets locally, enhancing performance for extensive data exploration.

Quick Start & Requirements

  • Install via pip: pip install pygwalker
  • Dependencies: pandas. Optional: DuckDB for kernel_computation.
  • Tested environments include Jupyter Notebook, Google Colab, Kaggle, Streamlit, and more.
  • Official demos and tutorials are available via provided links.

Highlighted Details

  • Interactive UI for drag-and-drop visual analysis.
  • Supports natural language queries for data exploration.
  • kernel_computation option enables DuckDB for larger datasets (up to 100GB).
  • Integrates seamlessly with Streamlit for web application deployment.
  • Chart configurations can be saved and loaded via JSON files.

Maintenance & Community

  • Active development with regular releases.
  • Community support available via Discord and GitHub issues.
  • Links to tutorials, demos, and related projects (GWalkR, RATH) are provided.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

  • The spec parameter for saving chart configurations requires manual saving via the UI; autosave is planned for future releases.
  • While kernel_computation supports larger datasets, performance may vary based on hardware and data complexity.
Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
130 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), and
9 more.

lilac by databricks

0.1%
1k
Data exploration tool for LLM dataset curation and quality control
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.