Research paper implementation for prompt compression via learned "gist" tokens
Top 91.9% on sourcepulse
This repository provides code and data for "Learning to Compress Prompts with Gist Tokens," a method for reducing prompt length in large language models. It's targeted at researchers and practitioners looking to optimize LLM inference efficiency and cost. The core benefit is enabling LLMs to retain performance with significantly shorter prompts by learning "gist tokens."
How It Works
The approach introduces "gist tokens" that are trained to summarize prompt context. During inference, these gist tokens replace the original prompt, allowing the model to process a much shorter input while preserving the essential information. This is achieved through specialized attention masking during training and a novel inference-time caching mechanism.
Quick Start & Requirements
pip install -r requirements.txt
fb366b9a
, DeepSpeed 0.8.3
.llama-7b
directory.Highlighted Details
batch_size = 1
for LLaMA-7B.gist
, pos_control
, neg_control
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
batch_size = 1
for LLaMA-7B, requiring modifications for larger batches.5 months ago
1 week