data-formulator  by microsoft

AI app for iterative data visualization creation

created 1 year ago
12,754 stars

Top 4.0% on sourcepulse

GitHubView on GitHub
Project Summary

Data Formulator is an AI-powered application designed to assist analysts in iteratively creating rich data visualizations. It combines a user interface with natural language processing, allowing users to specify visual encodings via drag-and-drop while delegating complex data transformations to AI agents. This approach aims to streamline the data visualization process by blending interactive design with intelligent data manipulation.

How It Works

Data Formulator leverages large language models (LLMs) to interpret user intent, expressed through both UI interactions and natural language prompts. When a user specifies visual encodings (e.g., mapping data fields to axes or colors), Data Formulator can generate SQL queries to transform the underlying data, even if the required fields are not directly present. This allows for dynamic data fetching and manipulation, enabling the creation of visualizations that require computations or joins. Recent updates enhance support for large datasets by integrating with DuckDB for local database operations.

Quick Start & Requirements

  • Install: pip install data_formulator
  • Run: data_formulator or python -m data_formulator
  • Prerequisites: OpenAI API key (or other supported LLMs via LiteLLM, including Azure, Ollama, Anthropic). Python 3.x.
  • Resources: Runs locally, with browser-based UI. Large data handling utilizes DuckDB.
  • Links: Releases, Codespaces, Development

Highlighted Details

  • Supports multiple LLM providers (OpenAI, Azure, Ollama, Anthropic) via LiteLLM.
  • Iterative data exploration and visualization through "Data Threads" and follow-up prompts.
  • Handles large datasets by loading them into a local DuckDB instance.
  • Experimental feature for parsing and cleaning messy text or images using AI.

Maintenance & Community

The project is actively developed by Microsoft Research, with frequent updates and new feature releases. Community interaction is encouraged via GitHub issues.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The effectiveness of data transformation and visualization generation is dependent on the chosen LLM's capabilities, particularly in code generation and instruction following. Users must provide API keys for supported LLMs.

Health Check
Last commit

23 hours ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
1,386 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alexander Wettig Alexander Wettig(Author of SWE-bench, SWE-agent), and
2 more.

data-juicer by modelscope

0.7%
5k
Data-Juicer: Data processing system for foundation models
created 2 years ago
updated 1 day ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

argilla by argilla-io

0.4%
5k
Collaboration tool for building high-quality AI datasets
created 4 years ago
updated 5 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
7 more.

mindsdb by mindsdb

0.5%
35k
AI query engine for federated data sources
created 7 years ago
updated 1 day ago
Feedback? Help us improve.