DPR  by facebookresearch

Dense Passage Retriever for open-domain Q&A research

created 5 years ago
1,825 stars

Top 24.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides tools and pre-trained models for Dense Passage Retrieval (DPR), a state-of-the-art approach for open-domain question answering. It's designed for researchers and practitioners in NLP and information retrieval looking to implement or experiment with dense retrieval systems. The primary benefit is enabling efficient and accurate retrieval of relevant passages for answering questions from large text corpora.

How It Works

DPR utilizes a bi-encoder architecture where separate BERT-based encoders process questions and passages. These encoders generate dense vector representations, and retrieval is performed by finding passages whose vectors are closest (via dot product) to the question vector. This approach allows for efficient similarity search using FAISS, outperforming traditional sparse methods like BM25 in many benchmarks.

Quick Start & Requirements

  • Install via pip: pip install .
  • Requires Python 3.6+ and PyTorch 1.2.0+.
  • Supports Huggingface (<=3.1.0), Pytext, and Fairseq encoders.
  • Pre-trained models and datasets can be downloaded using python data/download_data.py.

Highlighted Details

  • State-of-the-art performance on NQ dataset, with a new model achieving 52.47% top-1 passage retrieval accuracy.
  • Hydra-based configuration for command-line tools.
  • Pluggable data processing layer for custom datasets.
  • Inference utilizes FAISS for efficient indexing and retrieval.

Maintenance & Community

This project is from Facebook AI Research. Further community engagement details are not specified in the README.

Licensing & Compatibility

  • License: CC-BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International).
  • Non-commercial use restriction: This license prohibits commercial use.

Limitations & Caveats

The CC-BY-NC 4.0 license restricts commercial applications. WebQ validation requires entity normalization, which is not currently implemented.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
41 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 3 days ago
Feedback? Help us improve.