lida  by microsoft

Library for LLM-driven data visualization and infographic generation

created 2 years ago
3,093 stars

Top 15.8% on sourcepulse

GitHubView on GitHub
Project Summary

LIDA is a Python library designed for the automatic generation of data visualizations and infographics using large language models (LLMs). It targets data scientists, researchers, and developers who need to quickly create, edit, explain, and evaluate visualizations from data, offering a grammar-agnostic approach compatible with various visualization libraries and LLM providers.

How It Works

LIDA treats visualizations as code, enabling programmatic generation, execution, editing, explanation, and repair. It first summarizes a given dataset, then generates potential visualization goals based on the summary (optionally with a persona), and finally creates visualization code (e.g., for Matplotlib, Seaborn, Altair) that can be executed. This approach allows for iterative refinement and analysis of visualizations through natural language commands.

Quick Start & Requirements

  • Install via pip: pip install -U lida
  • Requires Python 3.10 or higher.
  • API key setup for LLM providers (e.g., export OPENAI_API_KEY=<your key>).
  • Optional dependencies for infographics: pip install lida[infographics]
  • Tutorial notebook and web UI/API available.

Highlighted Details

  • Supports multiple LLM providers: OpenAI, Azure OpenAI, PaLM, Cohere, Huggingface.
  • Features include data summarization, goal generation, visualization generation, editing, explanation, evaluation, repair, and recommendation.
  • Infographic generation is in beta, leveraging Stable Diffusion models.
  • Can utilize locally hosted LLMs via HuggingFace or OpenAI-compatible endpoints.
  • Claims an error rate of < 3.5% on generated visualizations.

Maintenance & Community

  • Project paper accepted at ACL 2023.
  • Built on insights from Data2Vis.
  • Community examples and integrations (e.g., LIDA + Streamlit) are available.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

LIDA currently works best with datasets having a small number of columns (<= 10) due to LLM context limitations. It assumes datasets are preprocessed and suitable for loading into pandas DataFrames. Smaller LLMs may have limited instruction-following capabilities, with LIDA performing best with larger models like GPT-3.5/4. The infographic generation is experimental.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
79 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Douwe Kiela Douwe Kiela(Cofounder of Contextual AI), and
1 more.

lens by ContextualAI

0%
352
Vision-language research paper using LLMs
created 2 years ago
updated 1 week ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

argilla by argilla-io

0.4%
5k
Collaboration tool for building high-quality AI datasets
created 4 years ago
updated 5 days ago
Feedback? Help us improve.