beir  by beir-cellar

IR benchmark for evaluating NLP retrieval models

created 4 years ago
1,901 stars

Top 23.4% on sourcepulse

GitHubView on GitHub
Project Summary

BEIR (A Heterogeneous Benchmark for Information Retrieval) provides a standardized framework and a diverse collection of 15+ datasets for evaluating the performance of Natural Language Processing (NLP)-based retrieval models. It is designed for researchers and practitioners in information retrieval and NLP who need to assess model effectiveness across various domains and tasks, particularly in zero-shot settings.

How It Works

BEIR offers a unified API for loading datasets, integrating retrieval models (lexical, dense, sparse, and re-ranking), and evaluating their performance using standard metrics like NDCG@k, MAP@K, and Recall@K. It supports various embedding models, including Sentence-BERT and Hugging Face Transformers, with flexible pooling strategies and pre/post-processing options. The framework facilitates reproducible research by providing reference implementations and a common evaluation pipeline.

Quick Start & Requirements

  • Install via pip: pip install beir
  • Tested with Python versions 3.9+.
  • Requires downloading datasets, which are publicly available.
  • Official examples and tutorials are available on the Wiki: https://github.com/beir-cellar/beir/wiki

Highlighted Details

  • Supports 17 diverse IR benchmark datasets, including MSMARCO, TREC-COVID, NQ, and HotpotQA.
  • Enables evaluation of various retrieval model types: lexical, dense, sparse, and re-ranking.
  • Provides a common framework for adding and evaluating custom models.
  • Offers a leaderboard for comparing model performance across datasets: https://eval.ai/web/challenges/challenge-page/1897

Maintenance & Community

  • Developed through collaboration between UKP Lab (TU Darmstadt), University of Waterloo, and Hugging Face.
  • Key contributors include Nandan Thakur, Nils Reimers, and Iryna Gurevych.
  • Contact via email or GitHub issues for support.

Licensing & Compatibility

  • The repository itself appears to be under a permissive license, but the included datasets are subject to their own terms. The README explicitly states: "It remains the user's responsibility to determine whether you as a user have permission to use the dataset under the dataset's license."

Limitations & Caveats

  • Some datasets are not publicly available or require specific reproduction steps.
  • The disclaimer emphasizes that users are responsible for dataset licensing and usage rights.
Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
120 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.