Research paper code for few-shot tabular data classification using LLMs
Top 86.7% on sourcepulse
TabLLM addresses the challenge of few-shot classification for tabular data by leveraging Large Language Models (LLMs). It targets researchers and practitioners in machine learning and natural language processing who need to perform classification tasks on structured datasets with limited labeled examples. The primary benefit is enabling effective classification without extensive task-specific fine-tuning or large labeled datasets.
How It Works
TabLLM converts tabular data into textual representations, allowing LLMs to process and classify it. This approach utilizes a "Text serialization" method, encoding each row as a text string with prompts, which proved most effective in experiments. The system then employs the t-few codebase for parameter-efficient fine-tuning of LLMs on these serialized datasets, enabling few-shot learning.
Quick Start & Requirements
transformers
, datasets
, sentencepiece
, protobuf
, xgboost
, lightgbm
, tabpfn
, fsspec
, urllib3
, importlib-metadata
, scikit-learn
.create_external_datasets.py
is provided to serialize nine public datasets../bin/few-shot-pretrained-100k.sh
.Highlighted Details
Maintenance & Community
The project is associated with authors from institutions like MIT and is part of the PMLR proceedings. It cites the t-few, PromptSource, and a NeurIPS paper, indicating a connection to established research efforts. No specific community channels (Discord/Slack) or active maintenance signals are explicitly mentioned in the README.
Licensing & Compatibility
The repository does not explicitly state a license. However, it heavily relies on the t-few project, which is typically under a permissive license (e.g., MIT). Compatibility for commercial use would depend on the licenses of all dependencies and the underlying LLMs used.
Limitations & Caveats
The code for handling private healthcare datasets and some additional experiments is not included due to privacy concerns. Users may encounter dependency issues when setting up the t-few environment, requiring careful adherence to the provided commands. Path configuration is critical and may need significant adaptation for different user setups.
1 year ago
1 week