PyTorch implementation of the Reformer, an efficient Transformer research paper
Top 21.1% on sourcepulse
This repository provides a PyTorch implementation of the Reformer model, an efficient Transformer architecture designed for handling long sequences with reduced memory and computational costs. It is suitable for researchers and practitioners working with large-scale sequence modeling tasks, offering significant memory savings over standard Transformers.
How It Works
The core innovation is the use of Locality-Sensitive Hashing (LSH) attention, which approximates the full attention mechanism by hashing queries and keys into buckets, allowing attention to be computed only within buckets. This is combined with reversible layers, which reduce memory usage by recomputing activations during the backward pass instead of storing them, and chunking for feedforward and attention layers to further manage memory.
Quick Start & Requirements
pip install reformer_pytorch
Highlighted Details
ReformerEncDec
wrapper for encoder-decoder architectures.Maintenance & Community
The project is actively maintained by lucidrains, with contributions from various individuals. Links to community channels or roadmaps are not explicitly provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.
Limitations & Caveats
The README notes a potential instability issue with O2 optimization level during mixed-precision training, recommending O1. It also mentions that sequence lengths must be divisible by bucket_size * 2
, with an Autopadder
helper provided to address this.
2 years ago
1 day