Dense Passage Retriever for open-domain Q&A research
Top 24.2% on sourcepulse
This repository provides tools and pre-trained models for Dense Passage Retrieval (DPR), a state-of-the-art approach for open-domain question answering. It's designed for researchers and practitioners in NLP and information retrieval looking to implement or experiment with dense retrieval systems. The primary benefit is enabling efficient and accurate retrieval of relevant passages for answering questions from large text corpora.
How It Works
DPR utilizes a bi-encoder architecture where separate BERT-based encoders process questions and passages. These encoders generate dense vector representations, and retrieval is performed by finding passages whose vectors are closest (via dot product) to the question vector. This approach allows for efficient similarity search using FAISS, outperforming traditional sparse methods like BM25 in many benchmarks.
Quick Start & Requirements
pip install .
python data/download_data.py
.Highlighted Details
Maintenance & Community
This project is from Facebook AI Research. Further community engagement details are not specified in the README.
Licensing & Compatibility
Limitations & Caveats
The CC-BY-NC 4.0 license restricts commercial applications. WebQ validation requires entity normalization, which is not currently implemented.
2 years ago
1 day