gisting by jayelm

Research paper implementation for prompt compression via learned "gist" tokens

Created 2 years ago

300 stars

Top 88.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Evan Hubinger

Head of Alignment Stress-Testing at Anthropic

Project Summary

This repository provides code and data for "Learning to Compress Prompts with Gist Tokens," a method for reducing prompt length in large language models. It's targeted at researchers and practitioners looking to optimize LLM inference efficiency and cost. The core benefit is enabling LLMs to retain performance with significantly shorter prompts by learning "gist tokens."

How It Works

The approach introduces "gist tokens" that are trained to summarize prompt context. During inference, these gist tokens replace the original prompt, allowing the model to process a much shorter input while preserving the essential information. This is achieved through specialized attention masking during training and a novel inference-time caching mechanism.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Requires specific versions: Transformers commit fb366b9a, DeepSpeed 0.8.3.
LLaMA-7B experiments require base model weights in a llama-7b directory.
Weights & Biases account needed for training.
See Hugging Face for checkpoints.

Highlighted Details

Gist compression currently supports batch_size = 1 for LLaMA-7B.
Checkpoints are provided as weight diffs for LLaMA-7B.
Includes scripts for applying weight diffs and performing inference with gist caching.
Training supports configurable gist tokens and masking conditions (gist, pos_control, neg_control).
Multi-GPU training with DeepSpeed is supported for larger models.
Benchmarking functionality is included to measure performance.

Maintenance & Community

No explicit community links (Discord/Slack) are provided in the README.
The project is associated with the paper "Learning to Compress Prompts with Gist Tokens" (arXiv:2304.08467).

Licensing & Compatibility

Codebase: Apache 2.0.
Data: Mixture of Self-Instruct (Apache 2.0) and Stanford Alpaca (CC BY-NC 4.0).
Commercial use may be restricted by the CC BY-NC 4.0 license on the Alpaca data.

Limitations & Caveats

Gist caching is limited to batch_size = 1 for LLaMA-7B, requiring modifications for larger batches.
The current implementation of gist caching may not yield significant speedups due to Python overhead; lower-level implementations are suggested for optimal performance.
Reproducibility of training results is sensitive to specific DeepSpeed versions.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days