Unsupervised dense information retrieval via contrastive learning
Top 47.3% on sourcepulse
Contriever is an open-source library for unsupervised dense information retrieval, offering pre-trained models and code for training and evaluation. It targets researchers and practitioners in NLP and information retrieval, enabling competitive retrieval performance without supervised data.
How It Works
Contriever employs a contrastive learning framework to pre-train models for information retrieval. It leverages a simple contrastive loss function to learn dense representations from text, allowing for efficient similarity comparisons via dot products between embeddings. This unsupervised approach makes it competitive with traditional methods like BM25 and enables strong performance, particularly in recall metrics.
Quick Start & Requirements
transformers
:
from src.contriever import Contriever
from transformers import AutoTokenizer
contriever = Contriever.from_pretrained("facebook/contriever")
tokenizer = AutoTokenizer.from_pretrained("facebook/contriever")
transformers
.contriever
, contriever-msmarco
) and multilingual (mcontriever
, mcontriever-msmarco
).Highlighted Details
mContriever
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 years ago
1+ week