ToolQA  by night-chen

Dataset for evaluating LLMs using external tools

created 2 years ago
272 stars

Top 95.6% on sourcepulse

GitHubView on GitHub
Project Summary

ToolQA is an open-source dataset designed to evaluate the capabilities of Large Language Models (LLMs) in answering complex questions that require the use of external tools. It targets researchers and developers working on tool-augmented LLMs, offering a benchmark for compositional tool usage and challenging question-answering scenarios.

How It Works

The dataset features questions across eight domains, categorized into "Easy" and "Hard" difficulty levels based on the complexity of toolchains required. Questions are designed to be difficult to memorize, necessitating the compositional use of multiple tools. The dataset includes reference corpora (text, databases, graphs) for tool interaction and provides data generation code for creating new questions.

Quick Start & Requirements

  • Data Download: Requires downloading external corpora and raw data for various domains (Flight, Coffee, Yelp, Airbnb, DBLP, GSM8K, SciREX, Agenda). Specific instructions and links are provided for each.
  • Tool Setup:
    • Retriever: Uses Langchain and Chroma vector database; pre-processed Chroma DB is available for download.
    • SQL Interpreter: Requires loading databases into MySQL, which can take hours.
    • Math Calculator: Requires a Wolframalpha developer account.
  • Dependencies: Python, Langchain, Chroma, MySQL, Wolframalpha API.
  • Resources: Significant disk space for datasets and potential hours for database setup.

Highlighted Details

  • 8 distinct domains with diverse knowledge formats (tabular, text, graph).
  • Questions designed to avoid memorization and require multi-tool composition.
  • Two difficulty levels (Easy/Hard) based on toolchain length.
  • Includes dataset statistics, data generation code, and baseline implementations.
  • Pre-processed Chroma vector database available for retriever setup.

Maintenance & Community

The project is actively being cleaned and will be released gradually. Contact yczhuang@gatech.edu for questions.

Licensing & Compatibility

The repository is available under an unspecified open-source license. The README does not explicitly state licensing terms or restrictions for commercial use.

Limitations & Caveats

The data and code are in the final stages of cleaning and will be released gradually. Some components, like the SQL interpreter setup, can be time-consuming. The license for commercial use is not specified.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.