tabled  by VikParuchuri

Table extraction library (deprecated, functionality moved to `marker`)

created 9 months ago
749 stars

Top 47.3% on sourcepulse

GitHubView on GitHub
Project Summary

This library extracts tables from PDFs and images into Markdown, CSV, or HTML formats. It's designed for researchers and developers needing to process tabular data embedded in documents, offering automated detection, layout analysis, and cell formatting.

How It Works

Tabled leverages the Surya library for initial table detection within documents. It then employs a layout analysis model to identify rows and columns, followed by a recognition model to extract and format cell content. This multi-stage approach aims for high accuracy in parsing complex table structures.

Quick Start & Requirements

Highlighted Details

  • Achieves an 0.847 alignment score against GPT-4 table predictions.
  • Processes tables at an average of 0.029 seconds per table on an A10G GPU.
  • Supports PDF, image, Word, and PowerPoint inputs.
  • Offers a Streamlit GUI for interactive use.

Maintenance & Community

  • The project is deprecated, with functionality migrated to marker.
  • Community discussions are hosted on Discord.

Licensing & Compatibility

  • Model weights are licensed under CC-BY-NC-SA-4.0.
  • Commercial use is permitted for organizations under $5M USD revenue and VC funding, or via a commercial license.
  • Dual-licensing options are available for commercial use.

Limitations & Caveats

The project is officially deprecated, recommending migration to marker for continued development and support.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.