unlimiformer by abertsch72

Research paper for long-range transformers with unlimited input

Created 2 years ago

1,063 stars

Top 35.6% on SourcePulse

View on GitHub

6 Experts Love This Project

John Yang

Coauthor of SWE-bench, SWE-agent

Junyang Lin

Core Maintainer at Alibaba Qwen

Omar Sanseviero

DevRel at Google DeepMind

Simon Willison

Coauthor of Django

and 2 more!

Project Summary

Unlimiformer enables transformer models to process arbitrarily long input sequences by augmenting them with a retrieval-based attention mechanism. This method is designed for researchers and practitioners working with large language models who need to overcome the quadratic complexity limitations of standard attention for extended contexts, offering improved performance on tasks requiring long-range understanding.

How It Works

Unlimiformer integrates a retrieval mechanism into existing encoder-decoder architectures without altering the core mathematical definition of attention. It achieves this by storing hidden states in an external datastore and retrieving relevant past tokens for attention computation. This approach allows models to attend to an unlimited context length, effectively bypassing the memory and computational constraints of traditional fixed-context transformers.

Quick Start & Requirements

Install: Copy src files into your project.
Prerequisites: Python, PyTorch, Hugging Face Transformers, Faiss. GPU with CUDA is recommended for performance.
Usage: Set test_unlimiformer=True for inference. For training, use flags like --unlimiformer_training or --random_unlimiformer_training.
Example: See src/run_generation.py for Llama-2 summarization and src/run.py for BART fine-tuning.
Docs: Official Implementation

Highlighted Details

Supports unlimited input length for pretrained encoder-decoder models.
Compatible with Llama-2 and its derivatives.
Offers multiple training strategies: retrieval training, random-encoded training, and alternating training.
Can utilize a Faiss datastore for managing hidden states, with options for GPU or CPU offloading to manage memory.

Maintenance & Community

Official implementation for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input".
Authors can be contacted via GitHub issues or email.

Licensing & Compatibility

The repository itself does not explicitly state a license. The underlying models referenced (e.g., Llama-2, BART) have their own licenses. Users should verify compatibility with their intended use case.

Limitations & Caveats

The effectiveness and optimal configuration (e.g., --layer_begin) of Unlimiformer are highly dependent on the specific model and dataset, requiring empirical tuning. Performance may degrade if datastore or index operations are offloaded from the GPU.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days