beir by beir-cellar

IR benchmark for evaluating NLP retrieval models

Created 5 years ago

2,087 stars

Top 20.9% on SourcePulse

View on GitHub

6 Experts Love This Project

Ying Sheng

Coauthor of SGLang

Jerry Liu

Cofounder of LlamaIndex

and 2 more!

Project Summary

BEIR (A Heterogeneous Benchmark for Information Retrieval) provides a standardized framework and a diverse collection of 15+ datasets for evaluating the performance of Natural Language Processing (NLP)-based retrieval models. It is designed for researchers and practitioners in information retrieval and NLP who need to assess model effectiveness across various domains and tasks, particularly in zero-shot settings.

How It Works

BEIR offers a unified API for loading datasets, integrating retrieval models (lexical, dense, sparse, and re-ranking), and evaluating their performance using standard metrics like NDCG@k, MAP@K, and Recall@K. It supports various embedding models, including Sentence-BERT and Hugging Face Transformers, with flexible pooling strategies and pre/post-processing options. The framework facilitates reproducible research by providing reference implementations and a common evaluation pipeline.

Quick Start & Requirements

Install via pip: pip install beir
Tested with Python versions 3.9+.
Requires downloading datasets, which are publicly available.
Official examples and tutorials are available on the Wiki: https://github.com/beir-cellar/beir/wiki

Highlighted Details

Supports 17 diverse IR benchmark datasets, including MSMARCO, TREC-COVID, NQ, and HotpotQA.
Enables evaluation of various retrieval model types: lexical, dense, sparse, and re-ranking.
Provides a common framework for adding and evaluating custom models.
Offers a leaderboard for comparing model performance across datasets: https://eval.ai/web/challenges/challenge-page/1897

Maintenance & Community

Developed through collaboration between UKP Lab (TU Darmstadt), University of Waterloo, and Hugging Face.
Key contributors include Nandan Thakur, Nils Reimers, and Iryna Gurevych.
Contact via email or GitHub issues for support.

Licensing & Compatibility

The repository itself appears to be under a permissive license, but the included datasets are subject to their own terms. The README explicitly states: "It remains the user's responsibility to determine whether you as a user have permission to use the dataset under the dataset's license."

Limitations & Caveats

Some datasets are not publicly available or require specific reproduction steps.
The disclaimer emphasizes that users are responsible for dataset licensing and usage rights.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

30 stars in the last 30 days