MultiHop-RAG by yixuantt

QA dataset for evaluating multi-document, metadata-aware RAG pipelines

Created 2 years ago

425 stars

Top 69.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

This repository provides the MultiHop-RAG dataset and associated code for evaluating Retrieval-Augmented Generation (RAG) systems. It addresses the challenge of complex, multi-document reasoning in RAG by offering a dataset where evidence for each query spans 2-4 documents and incorporates document metadata, targeting researchers and developers building advanced RAG applications.

How It Works

The dataset is designed to test RAG pipelines by requiring them to retrieve and synthesize information from multiple documents, mimicking real-world scenarios. It includes 2556 queries, each with evidence distributed across several documents, and incorporates document metadata to further challenge retrieval and generation models. The provided scripts demonstrate how to use the dataset for retrieval tasks, including options for reranking, and for end-to-end question answering with models like Llama.

Quick Start & Requirements

Install via pip: pip install llama-index==0.9.40
Run retrieval example: python simple_retrieval.py --retriever BAAI/llm-embedder
Run QA example: python qa_llama.py
Evaluation scripts: retrieval_evaluate.py and qa_evaluate.py
Requires Python 3.x.

Highlighted Details

Dataset contains 2556 queries with evidence across 2-4 documents.
Includes document metadata for realistic RAG evaluation.
Sample scripts for retrieval (with reranking) and QA are provided.
Evaluation scripts for both retrieval and QA are available.

Maintenance & Community

The repository is associated with the COLM 2024 paper "MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries." The construction pipeline code is open-sourced for research purposes but is noted as not yet tidy, with plans for future organization.

Licensing & Compatibility

MultiHop-RAG is licensed under ODC-BY. This license is generally permissive for commercial use and integration, but users should review the specific terms of the Open Data Commons Attribution License.

Limitations & Caveats

The code for dataset construction is currently described as "not very tidy" and is intended for research purposes, suggesting potential instability or lack of comprehensive documentation for that specific component.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days