Tabular LLM: LLM fine-tuning for table understanding
Top 54.9% on sourcepulse
This project aims to build large language models specifically for tabular intelligence tasks by collecting and formatting open-source datasets for instruction fine-tuning. It targets researchers and practitioners looking to enhance LLMs' capabilities in understanding and processing tabular data, offering a unified platform and curated datasets for tasks like question answering and text generation.
How It Works
The project leverages the Alpaca-CoT framework for instruction fine-tuning LLMs. It standardizes diverse tabular datasets into an instruction-following format, incorporating Chain-of-Thought (CoT) reasoning where available. The approach focuses on enhancing LLMs' comprehension of various table structures and tasks, with a commitment to open-sourcing processed data and fine-tuned models.
Quick Start & Requirements
Highlighted Details
input
, output
, table_rows
, table_repr
, table_repr_type
, table_type
, and task_type
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Some datasets are still in the older format (2023-05-08), and not all task/dataset combinations have updated sample counts. The project primarily focuses on text-based table representation, acknowledging document intelligence as an alternative for tables embedded within broader document contexts.
1 year ago
1 week