Discover and explore top open-source AI tools and projects—updated daily.
datamllabSelf-Extend: LLM context window extension via self-attention
Top 50.8% on SourcePulse
This repository provides an implementation of Self-Extend, a method to significantly extend the context window of Large Language Models (LLMs) without requiring any fine-tuning. It targets researchers and practitioners working with LLMs who need to process longer sequences, offering a way to leverage the inherent long-context capabilities of existing models.
How It Works
Self-Extend operates by constructing bi-level attention information: group-level and neighbor-level. These are computed using the model's existing self-attention mechanism, meaning no training is necessary. This approach stimulates the LLM's potential for handling longer contexts by intelligently structuring attention across segments of the input sequence.
Quick Start & Requirements
transformers==4.38.2, flash_attn==2.5.6. A Docker image (hoytjin/selfextend_docker:v0.1) is recommended to avoid environment issues.SelfExtend.apply(loaded_model, group_size, window_size, enable_flash_attention=False).python example.py.Highlighted Details
group_size and neighbor_window hyperparameters.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The effectiveness and optimal hyperparameter selection (group_size, neighbor_window) can depend on the specific model and task, with empirical rules provided as guidance. While FlashAttention is supported, its full functionality for the decoding stage is still under active debugging.
1 year ago
Inactive
tomaarsen
dvlab-research