TAG-Bench  by TAG-Research

Benchmark for table-augmented generation (TAG) research

created 11 months ago
747 stars

Top 47.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

TAG-Bench provides a benchmark and framework for Table-Augmented Generation (TAG), a paradigm for answering natural language questions over databases by unifying Large Language Models (LLMs) with database interactions. It targets researchers and practitioners in NLP and database communities, offering a standardized way to evaluate and advance methods that go beyond simple Text2SQL or RAG.

How It Works

TAG extends traditional Text2SQL and RAG by enabling more complex interactions between LLMs and databases. The TAG v1 benchmark, derived from BIRD, includes 80 queries requiring either world knowledge or semantic reasoning beyond explicit database content. This approach aims to capture a broader spectrum of database-interaction tasks, highlighting the limitations of current methods and motivating new research directions.

Quick Start & Requirements

  • Install: pip install -r requirements.txt and pip install -e . within a conda environment (conda create -n tag python=3.10 -y).
  • Prerequisites: Python 3.10, conda, git, bash, and a language model server (LOTUS documentation for configuration). GPU is recommended for indexing.
  • Setup: Download databases (get_dbs.sh), create indexes (embed_all_dfs.sh), and generate Text2SQL prompts (get_text2sql_prompts.sh).
  • Links: LOTUS Documentation (implied for LM configuration).

Highlighted Details

  • Evaluates methods like hand-written TAG, Text2SQL, Text2SQL+LM, RAG, and RAG+LM.
  • Benchmark queries include match-based, comparison, ranking, and aggregation types.
  • 40 queries require parametric knowledge, 40 require reasoning.
  • Analysis script (analyze.py) computes accuracy and latency.

Maintenance & Community

  • Project maintained by TAG-Research.
  • No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The presence of requirements.txt and setup scripts suggests standard Python package compatibility.

Limitations & Caveats

The benchmark is an initial release (v1) and focuses on a subset of query types. Reproducing results requires configuring a specific language model server (LOTUS), which is not detailed within this README.

Health Check
Last commit

4 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
27 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.