unlimiformer  by abertsch72

Research paper for long-range transformers with unlimited input

created 2 years ago
1,061 stars

Top 36.2% on sourcepulse

GitHubView on GitHub
Project Summary

Unlimiformer enables transformer models to process arbitrarily long input sequences by augmenting them with a retrieval-based attention mechanism. This method is designed for researchers and practitioners working with large language models who need to overcome the quadratic complexity limitations of standard attention for extended contexts, offering improved performance on tasks requiring long-range understanding.

How It Works

Unlimiformer integrates a retrieval mechanism into existing encoder-decoder architectures without altering the core mathematical definition of attention. It achieves this by storing hidden states in an external datastore and retrieving relevant past tokens for attention computation. This approach allows models to attend to an unlimited context length, effectively bypassing the memory and computational constraints of traditional fixed-context transformers.

Quick Start & Requirements

  • Install: Copy src files into your project.
  • Prerequisites: Python, PyTorch, Hugging Face Transformers, Faiss. GPU with CUDA is recommended for performance.
  • Usage: Set test_unlimiformer=True for inference. For training, use flags like --unlimiformer_training or --random_unlimiformer_training.
  • Example: See src/run_generation.py for Llama-2 summarization and src/run.py for BART fine-tuning.
  • Docs: Official Implementation

Highlighted Details

  • Supports unlimited input length for pretrained encoder-decoder models.
  • Compatible with Llama-2 and its derivatives.
  • Offers multiple training strategies: retrieval training, random-encoded training, and alternating training.
  • Can utilize a Faiss datastore for managing hidden states, with options for GPU or CPU offloading to manage memory.

Maintenance & Community

  • Official implementation for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input".
  • Authors can be contacted via GitHub issues or email.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. The underlying models referenced (e.g., Llama-2, BART) have their own licenses. Users should verify compatibility with their intended use case.

Limitations & Caveats

The effectiveness and optimal configuration (e.g., --layer_begin) of Unlimiformer are highly dependent on the specific model and dataset, requiring empirical tuning. Performance may degrade if datastore or index operations are offloaded from the GPU.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Matei Zaharia Matei Zaharia(Cofounder of Databricks), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

LWM by LargeWorldModel

0.0%
7k
Multimodal autoregressive model for long-context video/text
created 1 year ago
updated 9 months ago
Feedback? Help us improve.