Table-to-LaTeX transformation toolkit
Top 98.0% on SourcePulse
StructEqTable-Deploy provides a high-efficiency toolkit for converting table images into structured formats like LaTeX, HTML, and Markdown. It is designed for researchers and developers working with scientific publications, financial documents, or web pages containing tabular data, offering precise extraction and enabling downstream reasoning tasks.
How It Works
The system leverages large-scale, multi-modal data from the DocGenome benchmark, comprising over 2 million Image-LaTeX pairs across 156 disciplines. It employs end-to-end trained models, including InternVL2-1B and Pix2Struct-base variants, to precisely generate LaTeX descriptions from visual table inputs. This approach addresses challenges posed by complex headers and spanning cells, enhancing accuracy and broadening application scope.
Quick Start & Requirements
git clone
, pip install -r requirements.txt
, python setup develop
) or PyPI (pip install struct-eqtable
).python demo/demo.py
with specified image path, checkpoint, and output format.Highlighted Details
Maintenance & Community
The project has seen recent updates (late 2024) with new model releases and performance improvements. Contact is available via zhouhongbin@pjlab.org.cn for issues or questions.
Licensing & Compatibility
Released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The project is actively under development with a TODO list including expanding domain coverage and releasing pre-training/fine-tuning code. While TensorRT acceleration is noted, specific hardware requirements for optimal performance are not detailed.
8 months ago
Inactive