PyTorch implementation of Deepmind's RETRO retrieval-augmented LM
Top 42.2% on sourcepulse
This repository provides a PyTorch implementation of DeepMind's RETRO (Retrieval-Enhanced Transformer) model, designed for language modeling. It targets researchers and engineers aiming to achieve high performance with significantly fewer parameters than traditional large language models by incorporating a retrieval mechanism.
How It Works
RETRO augments a standard Transformer decoder with a retrieval component that fetches relevant text chunks from a large external database. This implementation uses rotary embeddings for positional encoding and Faiss for efficient nearest neighbor search, deviating slightly from the original paper's use of Scann. The architecture supports scaling to 1000 layers, as suggested by the DeepNet paper, by enabling a use_deepnet
flag for improved training stability.
Quick Start & Requirements
pip install retro-pytorch
RETRO
model and providing token sequences and retrieved neighbors. The TrainingWrapper
simplifies data preprocessing and training loop setup from text documents.Highlighted Details
Maintenance & Community
The project is maintained by lucidrains. No specific community channels or roadmap are explicitly mentioned in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The implementation deviates from the original RETRO paper by using rotary embeddings and Faiss. The README does not detail specific performance benchmarks or provide explicit guidance on the scale of datasets required for optimal results.
1 year ago
1 day