Context compressor for LLM inference efficiency (EMNLP 2023)
Top 74.7% on sourcepulse
Selective Context addresses the challenge of LLM context window limitations by compressing input text, enabling models to process twice the content while reducing memory and GPU usage by 40%. It is designed for researchers and practitioners working with long documents or extended conversations, offering significant efficiency gains without performance degradation.
How It Works
The core approach involves evaluating the informativeness of lexical units (sentences, phrases, or tokens) within a given context. It uses a base language model to compute self-information scores for these units, effectively identifying and retaining the most crucial information while discarding less relevant content. This method maximizes the utility of fixed context lengths in LLMs.
Quick Start & Requirements
pip install selective-context
python -m spacy download en_core_web_sm
(and zh_core_web_sm
for Chinese).streamlit run app/app.py
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The repository focuses on specific models and languages (e.g., GPT-2, English/Chinese) for its evaluations, and reproduction of paper experiments requires downloading custom datasets.
1 year ago
1 day