Research paper adapting LMs for long context compression
Top 88.0% on sourcepulse
This repository provides the official implementation for "Adapting Language Models to Compress Long Contexts," enabling language models to compress extensive context into summary vectors and reason over them. It targets researchers and practitioners working with long-context NLP tasks, offering a method to overcome context length limitations in transformer models.
How It Works
The core innovation is the "AutoCompressor" architecture, which integrates a context compression mechanism directly into the language model. This is achieved by training the model to generate a fixed-size set of "summary vectors" from segments of the input context. These summary vectors are then prepended to subsequent segments as soft prompts, allowing the model to retain and reason over information from much longer contexts than its native architecture would typically support. This approach avoids the quadratic complexity of standard attention mechanisms for long sequences.
Quick Start & Requirements
pip install packaging transformers==4.34.0 datasets==2.13.4 accelerate==0.24.1 sentencepiece==0.1.99 flash-attn==2.3.5 wandb
and pip install git+https://github.com/Dao-AILab/flash-attention.git#subdirectory=csrc/rotary
.bfloat16
support, and flash-attn
installation with correct CUDA_HOME
.bfloat16
).Highlighted Details
Maintenance & Community
The project is associated with Princeton University NLP research. For questions or bugs, users can contact the authors via email or open an issue on GitHub.
Licensing & Compatibility
The repository code is likely under a permissive license (e.g., MIT, Apache 2.0), but the underlying base models (Llama-2, OPT) have their own licenses. Llama-2's license has restrictions on commercial use for very large companies. Compatibility with closed-source linking depends on the base model licenses.
Limitations & Caveats
Flash Attention requires specific CUDA versions and hardware, and its use with use_cache=True
during evaluation might be unstable. The project relies on specific versions of libraries, potentially leading to compatibility issues with newer releases.
10 months ago
1 week