Research code for retrieval-augmented language models
Top 59.4% on sourcepulse
Atlas is a research code repository for few-shot learning with retrieval-augmented language models, targeting NLP researchers and practitioners. It enables joint pre-training of dense retrievers and encoder-decoder language models, achieving state-of-the-art results on tasks like Natural Questions with significantly fewer parameters than larger models.
How It Works
Atlas jointly trains a dense retriever (Contriever) and a fusion-in-decoder (FiD) language model (T5). It performs retrieval on-the-fly during training and inference using a custom distributed GPU index. This approach allows for efficient handling of large corpora (up to 400M passages) and dynamic index refreshing, optimizing retrieval accuracy and training stability.
Quick Start & Requirements
git clone https://github.com/facebookresearch/atlas.git && cd atlas && conda create --name atlas-env python=3.8 && conda activate atlas-env && conda install pytorch==1.11.0 cudatoolkit=11.3 -c pytorch && conda install -c pytorch faiss-gpu=1.7.2 cudatoolkit=11.3 && pip install -r requirements.txt
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The repository is explicitly marked as "NO LONGER MAINTAINED," meaning no future updates or bug fixes are expected. The CC-BY-NC license restricts commercial applications.
1 year ago
1+ week