RETRO-pytorch by lucidrains

PyTorch implementation of Deepmind's RETRO retrieval-augmented LM

Created 4 years ago

876 stars

Top 41.1% on SourcePulse

View on GitHub

4 Experts Love This Project

Jesse Clark

Cofounder of Marqo

Jeffrey Quesnelle

Cofounder of Nous Research

Philipp Schmid

DevRel at Google DeepMind

Andreas Blattmann

Cofounder of Black Forest Labs

Project Summary

This repository provides a PyTorch implementation of DeepMind's RETRO (Retrieval-Enhanced Transformer) model, designed for language modeling. It targets researchers and engineers aiming to achieve high performance with significantly fewer parameters than traditional large language models by incorporating a retrieval mechanism.

How It Works

RETRO augments a standard Transformer decoder with a retrieval component that fetches relevant text chunks from a large external database. This implementation uses rotary embeddings for positional encoding and Faiss for efficient nearest neighbor search, deviating slightly from the original paper's use of Scann. The architecture supports scaling to 1000 layers, as suggested by the DeepNet paper, by enabling a use_deepnet flag for improved training stability.

Quick Start & Requirements

Install: pip install retro-pytorch
Prerequisites: PyTorch, Faiss, SentencePiece, NumPy. GPU with CUDA is recommended for practical use.
Setup: Basic usage involves instantiating the RETRO model and providing token sequences and retrieved neighbors. The TrainingWrapper simplifies data preprocessing and training loop setup from text documents.

Highlighted Details

Achieves GPT-3 performance with 10x fewer parameters.
Integrates rotary embeddings and Faiss for retrieval.
Supports DeepNet for scaling to 1000 layers.
Includes utilities for data preprocessing, indexing, and generation.

Maintenance & Community

The project is maintained by lucidrains. No specific community channels or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The implementation deviates from the original RETRO paper by using rotary embeddings and Faiss. The README does not detail specific performance benchmarks or provide explicit guidance on the scale of datasets required for optimal results.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days