tabled  by VikParuchuri

Table extraction library (deprecated, functionality moved to `marker`)

Created 11 months ago
750 stars

Top 46.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This library extracts tables from PDFs and images into Markdown, CSV, or HTML formats. It's designed for researchers and developers needing to process tabular data embedded in documents, offering automated detection, layout analysis, and cell formatting.

How It Works

Tabled leverages the Surya library for initial table detection within documents. It then employs a layout analysis model to identify rows and columns, followed by a recognition model to extract and format cell content. This multi-stage approach aims for high accuracy in parsing complex table structures.

Quick Start & Requirements

Highlighted Details

  • Achieves an 0.847 alignment score against GPT-4 table predictions.
  • Processes tables at an average of 0.029 seconds per table on an A10G GPU.
  • Supports PDF, image, Word, and PowerPoint inputs.
  • Offers a Streamlit GUI for interactive use.

Maintenance & Community

  • The project is deprecated, with functionality migrated to marker.
  • Community discussions are hosted on Discord.

Licensing & Compatibility

  • Model weights are licensed under CC-BY-NC-SA-4.0.
  • Commercial use is permitted for organizations under $5M USD revenue and VC funding, or via a commercial license.
  • Dual-licensing options are available for commercial use.

Limitations & Caveats

The project is officially deprecated, recommending migration to marker for continued development and support.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.