RETRO-pytorch  by lucidrains

PyTorch implementation of Deepmind's RETRO retrieval-augmented LM

created 3 years ago
870 stars

Top 42.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch implementation of DeepMind's RETRO (Retrieval-Enhanced Transformer) model, designed for language modeling. It targets researchers and engineers aiming to achieve high performance with significantly fewer parameters than traditional large language models by incorporating a retrieval mechanism.

How It Works

RETRO augments a standard Transformer decoder with a retrieval component that fetches relevant text chunks from a large external database. This implementation uses rotary embeddings for positional encoding and Faiss for efficient nearest neighbor search, deviating slightly from the original paper's use of Scann. The architecture supports scaling to 1000 layers, as suggested by the DeepNet paper, by enabling a use_deepnet flag for improved training stability.

Quick Start & Requirements

  • Install: pip install retro-pytorch
  • Prerequisites: PyTorch, Faiss, SentencePiece, NumPy. GPU with CUDA is recommended for practical use.
  • Setup: Basic usage involves instantiating the RETRO model and providing token sequences and retrieved neighbors. The TrainingWrapper simplifies data preprocessing and training loop setup from text documents.

Highlighted Details

  • Achieves GPT-3 performance with 10x fewer parameters.
  • Integrates rotary embeddings and Faiss for retrieval.
  • Supports DeepNet for scaling to 1000 layers.
  • Includes utilities for data preprocessing, indexing, and generation.

Maintenance & Community

The project is maintained by lucidrains. No specific community channels or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The implementation deviates from the original RETRO paper by using rotary embeddings and Faiss. The README does not detail specific performance benchmarks or provide explicit guidance on the scale of datasets required for optimal results.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.