Research paper for long-range transformers with unlimited input
Top 36.2% on sourcepulse
Unlimiformer enables transformer models to process arbitrarily long input sequences by augmenting them with a retrieval-based attention mechanism. This method is designed for researchers and practitioners working with large language models who need to overcome the quadratic complexity limitations of standard attention for extended contexts, offering improved performance on tasks requiring long-range understanding.
How It Works
Unlimiformer integrates a retrieval mechanism into existing encoder-decoder architectures without altering the core mathematical definition of attention. It achieves this by storing hidden states in an external datastore and retrieving relevant past tokens for attention computation. This approach allows models to attend to an unlimited context length, effectively bypassing the memory and computational constraints of traditional fixed-context transformers.
Quick Start & Requirements
src
files into your project.test_unlimiformer=True
for inference. For training, use flags like --unlimiformer_training
or --random_unlimiformer_training
.src/run_generation.py
for Llama-2 summarization and src/run.py
for BART fine-tuning.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The effectiveness and optimal configuration (e.g., --layer_begin
) of Unlimiformer are highly dependent on the specific model and dataset, requiring empirical tuning. Performance may degrade if datastore or index operations are offloaded from the GPU.
1 year ago
1 day