TableLLM  by RUCKBReasoning

LLM for tabular data manipulation in office scenarios

Created 2 years ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

TableLLM addresses tabular data manipulation within real-world office scenarios, targeting users needing to interact with data embedded in spreadsheets and documents. It offers a dual approach: generating code for spreadsheet operations and direct text answers for document-based queries, enhancing productivity and data accessibility.

How It Works

TableLLM employs a dual-strategy architecture. For spreadsheet data, it generates Python code for operations like insert, delete, update, query, merge, and chart creation. For document data, it generates direct text answers, primarily for query tasks on short tables. The model, TableLLM-8B, is fine-tuned from Llama3.1-8B, leveraging its capabilities for nuanced data understanding and manipulation across diverse office data formats.

Quick Start & Requirements

Installation requires pip install -r requirements.txt. Inference scripts (inference_code.py, inference_text.py) are provided, along with model checkpoints. Deployment is supported via vLLM, necessitating MongoDB setup and configuration. Resources such as the project paper, homepage, model, and training set are mentioned as available.

Highlighted Details

  • TableLLM-8B achieves a strong average score of 86.7 across benchmarks, outperforming GPT4o (84.8) and GPT3.5 (69.8) on tasks including WikiTQ, TAT-QA, FeTaQA, WikiSQL, and Spider.
  • It excels in text generation for table QA, scoring 89.1 on WikiTQ, 89.5 on TAT-QA, and 93.36 on FeTaQA.
  • Code generation capabilities are robust, with scores of 89.6 on WikiSQL and 81.1 on Spider.
  • A custom benchmark covers insert, delete, update, query, merge, and chart operations.

Maintenance & Community

Recent updates include code, dataset, and model checkpoint releases (August 2025), and acceptance to ACL 2025 Findings. Frontend/backend deployment code was open-sourced (June 2024). Contact is via GitHub issues or email (zhang2718@ruc.edu.cn, luosijia0906@ruc.edu.cn, zhang-jing@ruc.edu.cn).

Licensing & Compatibility

The README does not explicitly state the project's license. This omission is a significant adoption blocker, particularly for commercial or closed-source integration.

Limitations & Caveats

The current implementation focuses on single-table queries for the Spider benchmark. Running the Llama3.1-8B base model likely requires substantial hardware resources. The absence of a specified license is a critical caveat.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.