BEIR (A Heterogeneous Benchmark for Information Retrieval) provides a standardized framework and a diverse collection of 15+ datasets for evaluating the performance of Natural Language Processing (NLP)-based retrieval models. It is designed for researchers and practitioners in information retrieval and NLP who need to assess model effectiveness across various domains and tasks, particularly in zero-shot settings.
How It Works
BEIR offers a unified API for loading datasets, integrating retrieval models (lexical, dense, sparse, and re-ranking), and evaluating their performance using standard metrics like NDCG@k, MAP@K, and Recall@K. It supports various embedding models, including Sentence-BERT and Hugging Face Transformers, with flexible pooling strategies and pre/post-processing options. The framework facilitates reproducible research by providing reference implementations and a common evaluation pipeline.
Quick Start & Requirements
- Install via pip:
pip install beir
- Tested with Python versions 3.9+.
- Requires downloading datasets, which are publicly available.
- Official examples and tutorials are available on the Wiki: https://github.com/beir-cellar/beir/wiki
Highlighted Details
- Supports 17 diverse IR benchmark datasets, including MSMARCO, TREC-COVID, NQ, and HotpotQA.
- Enables evaluation of various retrieval model types: lexical, dense, sparse, and re-ranking.
- Provides a common framework for adding and evaluating custom models.
- Offers a leaderboard for comparing model performance across datasets: https://eval.ai/web/challenges/challenge-page/1897
Maintenance & Community
- Developed through collaboration between UKP Lab (TU Darmstadt), University of Waterloo, and Hugging Face.
- Key contributors include Nandan Thakur, Nils Reimers, and Iryna Gurevych.
- Contact via email or GitHub issues for support.
Licensing & Compatibility
- The repository itself appears to be under a permissive license, but the included datasets are subject to their own terms. The README explicitly states: "It remains the user's responsibility to determine whether you as a user have permission to use the dataset under the dataset's license."
Limitations & Caveats
- Some datasets are not publicly available or require specific reproduction steps.
- The disclaimer emphasizes that users are responsible for dataset licensing and usage rights.