QA dataset for evaluating multi-document, metadata-aware RAG pipelines
Top 80.9% on sourcepulse
This repository provides the MultiHop-RAG dataset and associated code for evaluating Retrieval-Augmented Generation (RAG) systems. It addresses the challenge of complex, multi-document reasoning in RAG by offering a dataset where evidence for each query spans 2-4 documents and incorporates document metadata, targeting researchers and developers building advanced RAG applications.
How It Works
The dataset is designed to test RAG pipelines by requiring them to retrieve and synthesize information from multiple documents, mimicking real-world scenarios. It includes 2556 queries, each with evidence distributed across several documents, and incorporates document metadata to further challenge retrieval and generation models. The provided scripts demonstrate how to use the dataset for retrieval tasks, including options for reranking, and for end-to-end question answering with models like Llama.
Quick Start & Requirements
pip install llama-index==0.9.40
python simple_retrieval.py --retriever BAAI/llm-embedder
python qa_llama.py
retrieval_evaluate.py
and qa_evaluate.py
Highlighted Details
Maintenance & Community
The repository is associated with the COLM 2024 paper "MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries." The construction pipeline code is open-sourced for research purposes but is noted as not yet tidy, with plans for future organization.
Licensing & Compatibility
MultiHop-RAG is licensed under ODC-BY. This license is generally permissive for commercial use and integration, but users should review the specific terms of the Open Data Commons Attribution License.
Limitations & Caveats
The code for dataset construction is currently described as "not very tidy" and is intended for research purposes, suggesting potential instability or lack of comprehensive documentation for that specific component.
4 months ago
1 day