Selective_Context by liyucheng09

Context compressor for LLM inference efficiency (EMNLP 2023)

Created 2 years ago

408 stars

Top 71.5% on SourcePulse

Project Summary

Selective Context addresses the challenge of LLM context window limitations by compressing input text, enabling models to process twice the content while reducing memory and GPU usage by 40%. It is designed for researchers and practitioners working with long documents or extended conversations, offering significant efficiency gains without performance degradation.

How It Works

The core approach involves evaluating the informativeness of lexical units (sentences, phrases, or tokens) within a given context. It uses a base language model to compute self-information scores for these units, effectively identifying and retaining the most crucial information while discarding less relevant content. This method maximizes the utility of fixed context lengths in LLMs.

Quick Start & Requirements

Install via pip: pip install selective-context
Download spaCy models: python -m spacy download en_core_web_sm (and zh_core_web_sm for Chinese).
Official demo available on Huggingface Space.
Streamlit app included: streamlit run app/app.py

Highlighted Details

Enables processing of 2x more content.
Claims 40% reduction in memory and GPU time.
Evaluated on summarization, QA, context reconstruction, and conversation tasks.
Accepted for EMNLP 2023.

Maintenance & Community

Project accepted to EMNLP 2023.
Links to Huggingface Hub datasets and previous arXiv version provided.

Licensing & Compatibility

Licensed under the MIT License.
Permissive license suitable for commercial use and closed-source integration.

Limitations & Caveats

The repository focuses on specific models and languages (e.g., GPT-2, English/Chinese) for its evaluations, and reproduction of paper experiments requires downloading custom datasets.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days