Benchmark for table-augmented generation (TAG) research
Top 47.4% on sourcepulse
TAG-Bench provides a benchmark and framework for Table-Augmented Generation (TAG), a paradigm for answering natural language questions over databases by unifying Large Language Models (LLMs) with database interactions. It targets researchers and practitioners in NLP and database communities, offering a standardized way to evaluate and advance methods that go beyond simple Text2SQL or RAG.
How It Works
TAG extends traditional Text2SQL and RAG by enabling more complex interactions between LLMs and databases. The TAG v1 benchmark, derived from BIRD, includes 80 queries requiring either world knowledge or semantic reasoning beyond explicit database content. This approach aims to capture a broader spectrum of database-interaction tasks, highlighting the limitations of current methods and motivating new research directions.
Quick Start & Requirements
pip install -r requirements.txt
and pip install -e .
within a conda
environment (conda create -n tag python=3.10 -y
).conda
, git
, bash
, and a language model server (LOTUS documentation for configuration). GPU is recommended for indexing.get_dbs.sh
), create indexes (embed_all_dfs.sh
), and generate Text2SQL prompts (get_text2sql_prompts.sh
).Highlighted Details
analyze.py
) computes accuracy and latency.Maintenance & Community
Licensing & Compatibility
requirements.txt
and setup scripts suggests standard Python package compatibility.Limitations & Caveats
The benchmark is an initial release (v1) and focuses on a subset of query types. Reproducing results requires configuring a specific language model server (LOTUS), which is not detailed within this README.
4 months ago
1+ week