data-formulator  by microsoft

AI app for iterative data visualization creation

Created 1 year ago
13,680 stars

Top 3.6% on SourcePulse

GitHubView on GitHub
Project Summary

Data Formulator is an AI-powered application designed to assist analysts in iteratively creating rich data visualizations. It combines a user interface with natural language processing, allowing users to specify visual encodings via drag-and-drop while delegating complex data transformations to AI agents. This approach aims to streamline the data visualization process by blending interactive design with intelligent data manipulation.

How It Works

Data Formulator leverages large language models (LLMs) to interpret user intent, expressed through both UI interactions and natural language prompts. When a user specifies visual encodings (e.g., mapping data fields to axes or colors), Data Formulator can generate SQL queries to transform the underlying data, even if the required fields are not directly present. This allows for dynamic data fetching and manipulation, enabling the creation of visualizations that require computations or joins. Recent updates enhance support for large datasets by integrating with DuckDB for local database operations.

Quick Start & Requirements

  • Install: pip install data_formulator
  • Run: data_formulator or python -m data_formulator
  • Prerequisites: OpenAI API key (or other supported LLMs via LiteLLM, including Azure, Ollama, Anthropic). Python 3.x.
  • Resources: Runs locally, with browser-based UI. Large data handling utilizes DuckDB.
  • Links: Releases, Codespaces, Development

Highlighted Details

  • Supports multiple LLM providers (OpenAI, Azure, Ollama, Anthropic) via LiteLLM.
  • Iterative data exploration and visualization through "Data Threads" and follow-up prompts.
  • Handles large datasets by loading them into a local DuckDB instance.
  • Experimental feature for parsing and cleaning messy text or images using AI.

Maintenance & Community

The project is actively developed by Microsoft Research, with frequent updates and new feature releases. Community interaction is encouraged via GitHub issues.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The effectiveness of data transformation and visualization generation is dependent on the chosen LLM's capabilities, particularly in code generation and instruction following. Users must provide API keys for supported LLMs.

Health Check
Last Commit

22 hours ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
3
Star History
356 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.