ChartVLM  by InternScience

Multi-modal foundation model for complex chart reasoning

Created 2 years ago
250 stars

Top 100.0% on SourcePulse

GitHubView on GitHub
Project Summary

InternScience/ChartVLM offers a comprehensive solution for evaluating and enhancing Multi-modal Large Language Models' (MLLMs) capabilities in understanding and reasoning about complex charts. It introduces ChartX, a large-scale benchmark dataset, and ChartVLM, a specialized foundation model designed for interpretable chart and geometric image reasoning. This project benefits researchers and practitioners by providing rigorous evaluation tools and a high-performing model that achieves performance comparable to GPT-4V, addressing a critical gap in current MLLM applications.

How It Works

ChartVLM operates via a two-stage methodology. Initially, a base perception module processes chart images to extract structural data, such as converting charts into CSV format. Subsequently, cognition modules leverage this extracted structural information to perform higher-level tasks, including chart redrawing, generating descriptions, summarizing content, and answering specific questions. An integrated instruction adapter allows the model to dynamically select and execute tasks based on user prompts, thereby improving interpretability for chart-specific reasoning.

Quick Start & Requirements

To begin, clone the repository using git clone https://github.com/UniModal4Reasoning/ChartVLM.git and install the necessary Python dependencies via pip install -r requirements.txt. Users must download and organize pre-trained checkpoints for ChartVLM-base or ChartVLM-large from Hugging Face according to the specified directory structure. Key resources include the Related Paper, Project Website, the ChartX Dataset, and ChartVLM Models.

Highlighted Details

  • ChartX Benchmark: This extensive multi-modal dataset features 48,000 chart samples, covering 18 distinct chart types, 22 subject topics, and 7 analytical tasks. Tasks include structural extraction, question answering, and content summarization.
  • Performance: ChartVLM demonstrates state-of-the-art performance on chart reasoning tasks, achieving results comparable to GPT-4V and surpassing other specialized and general MLLMs on the ChartX benchmark.
  • Multi-modal Modalities: Each data instance integrates chart images with corresponding CSV data, Python code, and textual descriptions, facilitating comprehensive analysis and model training.
  • Evaluation Metrics: The project employs a suite of metrics, including Structural Extraction (SCRM), Exact Match (EM), GPT-accuracy, and GPT-score, to rigorously evaluate performance across diverse chart-related tasks.

Maintenance & Community

The provided README snippet does not detail specific community channels, such as Discord or Slack, nor does it mention major contributors or sponsorships.

Licensing & Compatibility

The README snippet does not explicitly state the project's license or provide compatibility notes relevant to commercial use or integration with closed-source projects.

Limitations & Caveats

The provided documentation focuses on the capabilities and benchmark achievements of ChartVLM and ChartX, without explicitly detailing limitations. The project's stated aim to "pave the way for further exploration" suggests an ongoing research and development trajectory.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
5 more.

lit by PAIR-code

0.1%
4k
Interactive ML model analysis tool for understanding model behavior
Created 5 years ago
Updated 2 days ago
Feedback? Help us improve.