Self-Extend: LLM context window extension via self-attention
Top 51.7% on sourcepulse
This repository provides an implementation of Self-Extend, a method to significantly extend the context window of Large Language Models (LLMs) without requiring any fine-tuning. It targets researchers and practitioners working with LLMs who need to process longer sequences, offering a way to leverage the inherent long-context capabilities of existing models.
How It Works
Self-Extend operates by constructing bi-level attention information: group-level and neighbor-level. These are computed using the model's existing self-attention mechanism, meaning no training is necessary. This approach stimulates the LLM's potential for handling longer contexts by intelligently structuring attention across segments of the input sequence.
Quick Start & Requirements
transformers==4.38.2
, flash_attn==2.5.6
. A Docker image (hoytjin/selfextend_docker:v0.1
) is recommended to avoid environment issues.SelfExtend.apply(loaded_model, group_size, window_size, enable_flash_attention=False)
.python example.py
.Highlighted Details
group_size
and neighbor_window
hyperparameters.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The effectiveness and optimal hyperparameter selection (group_size
, neighbor_window
) can depend on the specific model and task, with empirical rules provided as guidance. While FlashAttention is supported, its full functionality for the decoding stage is still under active debugging.
1 year ago
1 day