reformer-pytorch by lucidrains

PyTorch implementation of the Reformer, an efficient Transformer research paper

Created 6 years ago

2,193 stars

Top 20.4% on SourcePulse

View on GitHub

7 Experts Love This Project

Chenlin Meng

Cofounder of Pika

Haotian Liu

Author of LLaVA; Research Scientist at xAI

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Thomas Wolf

Cofounder of Hugging Face

and 3 more!

Project Summary

This repository provides a PyTorch implementation of the Reformer model, an efficient Transformer architecture designed for handling long sequences with reduced memory and computational costs. It is suitable for researchers and practitioners working with large-scale sequence modeling tasks, offering significant memory savings over standard Transformers.

How It Works

The core innovation is the use of Locality-Sensitive Hashing (LSH) attention, which approximates the full attention mechanism by hashing queries and keys into buckets, allowing attention to be computed only within buckets. This is combined with reversible layers, which reduce memory usage by recomputing activations during the backward pass instead of storing them, and chunking for feedforward and attention layers to further manage memory.

Quick Start & Requirements

Install via pip: pip install reformer_pytorch
Requires PyTorch. GPU with CUDA is recommended for performance.
Official documentation and examples are available in the README.

Highlighted Details

Implements LSH attention, reversible layers, and chunking for efficiency.
Supports various positional embeddings (rotary, axial, absolute).
Includes optional features like Product Key Memory (PKM), GLU feedforward, and layer dropout.
Offers a ReformerEncDec wrapper for encoder-decoder architectures.
Compatible with Microsoft's DeepSpeed for distributed training.

Maintenance & Community

The project is actively maintained by lucidrains, with contributions from various individuals. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The README notes a potential instability issue with O2 optimization level during mixed-precision training, recommending O1. It also mentions that sequence lengths must be divisible by bucket_size * 2, with an Autopadder helper provided to address this.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days