Code for retrieval-oriented language model pre-training via masked auto-encoders
Top 97.5% on sourcepulse
RetroMAE provides a codebase for pre-training and fine-tuning retrieval-oriented language models using a Masked Auto-Encoder approach. It targets researchers and practitioners in information retrieval and natural language processing, offering state-of-the-art performance on benchmarks like MS MARCO and BEIR.
How It Works
RetroMAE employs a Masked Auto-Encoder (MAE) strategy for pre-training, which reconstructs masked tokens. This approach, particularly in its v2 iteration (Duplex MAE), is designed to enhance the transferability and zero-shot capabilities of dense retrievers, leading to improved performance on both in-domain and out-of-domain datasets.
Quick Start & Requirements
pip install .
or pip install -e .
for development.Shitao/RetroMAE
).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
torchrun
commands.1 year ago
Inactive