lida  by microsoft

Library for LLM-driven data visualization and infographic generation

Created 2 years ago
3,137 stars

Top 15.3% on SourcePulse

GitHubView on GitHub
Project Summary

LIDA is a Python library designed for the automatic generation of data visualizations and infographics using large language models (LLMs). It targets data scientists, researchers, and developers who need to quickly create, edit, explain, and evaluate visualizations from data, offering a grammar-agnostic approach compatible with various visualization libraries and LLM providers.

How It Works

LIDA treats visualizations as code, enabling programmatic generation, execution, editing, explanation, and repair. It first summarizes a given dataset, then generates potential visualization goals based on the summary (optionally with a persona), and finally creates visualization code (e.g., for Matplotlib, Seaborn, Altair) that can be executed. This approach allows for iterative refinement and analysis of visualizations through natural language commands.

Quick Start & Requirements

  • Install via pip: pip install -U lida
  • Requires Python 3.10 or higher.
  • API key setup for LLM providers (e.g., export OPENAI_API_KEY=<your key>).
  • Optional dependencies for infographics: pip install lida[infographics]
  • Tutorial notebook and web UI/API available.

Highlighted Details

  • Supports multiple LLM providers: OpenAI, Azure OpenAI, PaLM, Cohere, Huggingface.
  • Features include data summarization, goal generation, visualization generation, editing, explanation, evaluation, repair, and recommendation.
  • Infographic generation is in beta, leveraging Stable Diffusion models.
  • Can utilize locally hosted LLMs via HuggingFace or OpenAI-compatible endpoints.
  • Claims an error rate of < 3.5% on generated visualizations.

Maintenance & Community

  • Project paper accepted at ACL 2023.
  • Built on insights from Data2Vis.
  • Community examples and integrations (e.g., LIDA + Streamlit) are available.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

LIDA currently works best with datasets having a small number of columns (<= 10) due to LLM context limitations. It assumes datasets are preprocessed and suitable for loading into pandas DataFrames. Smaller LLMs may have limited instruction-following capabilities, with LIDA performing best with larger models like GPT-3.5/4. The infographic generation is experimental.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
3 more.

unified-io-2 by allenai

0.3%
626
Unified-IO 2 code for training, inference, and demo
Created 1 year ago
Updated 1 year ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

Curator by NVIDIA-NeMo

1.3%
1k
Data curation toolkit for LLMs
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.