Dataset for evaluating LLMs using external tools
Top 95.6% on sourcepulse
ToolQA is an open-source dataset designed to evaluate the capabilities of Large Language Models (LLMs) in answering complex questions that require the use of external tools. It targets researchers and developers working on tool-augmented LLMs, offering a benchmark for compositional tool usage and challenging question-answering scenarios.
How It Works
The dataset features questions across eight domains, categorized into "Easy" and "Hard" difficulty levels based on the complexity of toolchains required. Questions are designed to be difficult to memorize, necessitating the compositional use of multiple tools. The dataset includes reference corpora (text, databases, graphs) for tool interaction and provides data generation code for creating new questions.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is actively being cleaned and will be released gradually. Contact yczhuang@gatech.edu for questions.
Licensing & Compatibility
The repository is available under an unspecified open-source license. The README does not explicitly state licensing terms or restrictions for commercial use.
Limitations & Caveats
The data and code are in the final stages of cleaning and will be released gradually. Some components, like the SQL interpreter setup, can be time-consuming. The license for commercial use is not specified.
1 year ago
1 day