Python toolkit for reproducible information retrieval research
Top 23.5% on sourcepulse
Pyserini is a Python toolkit for reproducible information retrieval (IR) research, enabling efficient first-stage retrieval using both sparse (e.g., BM25, uniCOIL, SPLADE) and dense (e.g., DPR, Contriever, BGE) representations. It targets researchers and practitioners in IR and NLP, offering prebuilt indexes, queries, relevance judgments, and evaluation scripts for numerous standard test collections, simplifying the reproduction of experimental runs.
How It Works
Pyserini integrates with Anserini (Lucene-based) for sparse retrieval and Faiss for dense retrieval. This dual approach allows for flexible and powerful retrieval strategies, including hybrid sparse-dense fusion. The toolkit is designed for ease of use and reproducibility, providing a self-contained Python package with comprehensive documentation and pre-configured experimental setups for various corpora.
Quick Start & Requirements
pip install pyserini
faiss-cpu
, lightgbm
) can be installed with pip install 'pyserini[optional]'
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
4 days ago
Inactive