MultiHop-RAG  by yixuantt

QA dataset for evaluating multi-document, metadata-aware RAG pipelines

created 1 year ago
348 stars

Top 80.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the MultiHop-RAG dataset and associated code for evaluating Retrieval-Augmented Generation (RAG) systems. It addresses the challenge of complex, multi-document reasoning in RAG by offering a dataset where evidence for each query spans 2-4 documents and incorporates document metadata, targeting researchers and developers building advanced RAG applications.

How It Works

The dataset is designed to test RAG pipelines by requiring them to retrieve and synthesize information from multiple documents, mimicking real-world scenarios. It includes 2556 queries, each with evidence distributed across several documents, and incorporates document metadata to further challenge retrieval and generation models. The provided scripts demonstrate how to use the dataset for retrieval tasks, including options for reranking, and for end-to-end question answering with models like Llama.

Quick Start & Requirements

  • Install via pip: pip install llama-index==0.9.40
  • Run retrieval example: python simple_retrieval.py --retriever BAAI/llm-embedder
  • Run QA example: python qa_llama.py
  • Evaluation scripts: retrieval_evaluate.py and qa_evaluate.py
  • Requires Python 3.x.

Highlighted Details

  • Dataset contains 2556 queries with evidence across 2-4 documents.
  • Includes document metadata for realistic RAG evaluation.
  • Sample scripts for retrieval (with reranking) and QA are provided.
  • Evaluation scripts for both retrieval and QA are available.

Maintenance & Community

The repository is associated with the COLM 2024 paper "MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries." The construction pipeline code is open-sourced for research purposes but is noted as not yet tidy, with plans for future organization.

Licensing & Compatibility

MultiHop-RAG is licensed under ODC-BY. This license is generally permissive for commercial use and integration, but users should review the specific terms of the Open Data Commons Attribution License.

Limitations & Caveats

The code for dataset construction is currently described as "not very tidy" and is intended for research purposes, suggesting potential instability or lack of comprehensive documentation for that specific component.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
49 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.