pygwalker  by Kanaries

Interactive UI for Pandas dataframes in Jupyter

created 2 years ago
15,059 stars

Top 3.4% on sourcepulse

GitHubView on GitHub
Project Summary

PyGWalker is a Python library designed to simplify exploratory data analysis and visualization within Jupyter environments. It transforms pandas DataFrames into interactive user interfaces, enabling users to perform drag-and-drop visual analysis, data cleaning, and even natural language queries, effectively serving as an open-source alternative to Tableau for data scientists.

How It Works

PyGWalker integrates the Jupyter Notebook environment with Graphic Walker, a powerful visualization engine. It leverages a Python binding to expose Graphic Walker's capabilities, allowing users to interactively manipulate data visualizations directly from their notebooks. The library supports kernel_computation using DuckDB for handling larger datasets locally, enhancing performance for extensive data exploration.

Quick Start & Requirements

  • Install via pip: pip install pygwalker
  • Dependencies: pandas. Optional: DuckDB for kernel_computation.
  • Tested environments include Jupyter Notebook, Google Colab, Kaggle, Streamlit, and more.
  • Official demos and tutorials are available via provided links.

Highlighted Details

  • Interactive UI for drag-and-drop visual analysis.
  • Supports natural language queries for data exploration.
  • kernel_computation option enables DuckDB for larger datasets (up to 100GB).
  • Integrates seamlessly with Streamlit for web application deployment.
  • Chart configurations can be saved and loaded via JSON files.

Maintenance & Community

  • Active development with regular releases.
  • Community support available via Discord and GitHub issues.
  • Links to tutorials, demos, and related projects (GWalkR, RATH) are provided.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

  • The spec parameter for saving chart configurations requires manual saving via the UI; autosave is planned for future releases.
  • While kernel_computation supports larger datasets, performance may vary based on hardware and data complexity.
Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
361 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alexander Wettig Alexander Wettig(Author of SWE-bench, SWE-agent), and
2 more.

data-juicer by modelscope

0.7%
5k
Data-Juicer: Data processing system for foundation models
created 2 years ago
updated 1 day ago
Feedback? Help us improve.