ChartVLM by InternScience

Multi-modal foundation model for complex chart reasoning

Created 2 years ago

260 stars

Top 97.6% on SourcePulse

Project Summary

InternScience/ChartVLM offers a comprehensive solution for evaluating and enhancing Multi-modal Large Language Models' (MLLMs) capabilities in understanding and reasoning about complex charts. It introduces ChartX, a large-scale benchmark dataset, and ChartVLM, a specialized foundation model designed for interpretable chart and geometric image reasoning. This project benefits researchers and practitioners by providing rigorous evaluation tools and a high-performing model that achieves performance comparable to GPT-4V, addressing a critical gap in current MLLM applications.

How It Works

ChartVLM operates via a two-stage methodology. Initially, a base perception module processes chart images to extract structural data, such as converting charts into CSV format. Subsequently, cognition modules leverage this extracted structural information to perform higher-level tasks, including chart redrawing, generating descriptions, summarizing content, and answering specific questions. An integrated instruction adapter allows the model to dynamically select and execute tasks based on user prompts, thereby improving interpretability for chart-specific reasoning.

Quick Start & Requirements

To begin, clone the repository using git clone https://github.com/UniModal4Reasoning/ChartVLM.git and install the necessary Python dependencies via pip install -r requirements.txt. Users must download and organize pre-trained checkpoints for ChartVLM-base or ChartVLM-large from Hugging Face according to the specified directory structure. Key resources include the Related Paper, Project Website, the ChartX Dataset, and ChartVLM Models.

Highlighted Details

ChartX Benchmark: This extensive multi-modal dataset features 48,000 chart samples, covering 18 distinct chart types, 22 subject topics, and 7 analytical tasks. Tasks include structural extraction, question answering, and content summarization.
Performance: ChartVLM demonstrates state-of-the-art performance on chart reasoning tasks, achieving results comparable to GPT-4V and surpassing other specialized and general MLLMs on the ChartX benchmark.
Multi-modal Modalities: Each data instance integrates chart images with corresponding CSV data, Python code, and textual descriptions, facilitating comprehensive analysis and model training.
Evaluation Metrics: The project employs a suite of metrics, including Structural Extraction (SCRM), Exact Match (EM), GPT-accuracy, and GPT-score, to rigorously evaluate performance across diverse chart-related tasks.

Maintenance & Community

The provided README snippet does not detail specific community channels, such as Discord or Slack, nor does it mention major contributors or sponsorships.

Licensing & Compatibility

The README snippet does not explicitly state the project's license or provide compatibility notes relevant to commercial use or integration with closed-source projects.

Limitations & Caveats

The provided documentation focuses on the capabilities and benchmark achievements of ChartVLM and ChartX, without explicitly detailing limitations. The project's stated aim to "pave the way for further exploration" suggests an ongoing research and development trajectory.

ChartVLM by InternScience

Explore Similar Projects

OneChart by LingyvKong

ChartLlama-code by tingxueronghua

ChartQA by vis-nlp

chat2plot by nyanp

ReasonGraph by ZongqianLi

MathVista by lupantech

ThoughtSource by OpenBioLink

VMind by VisActor

G-Retriever by XiaoxinHe

Data-Analysis-Agent by Zafer-Liu

Rath by Kanaries

smart-excalidraw-next by liujuntao123