gisting  by jayelm

Research paper implementation for prompt compression via learned "gist" tokens

created 2 years ago
289 stars

Top 91.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code and data for "Learning to Compress Prompts with Gist Tokens," a method for reducing prompt length in large language models. It's targeted at researchers and practitioners looking to optimize LLM inference efficiency and cost. The core benefit is enabling LLMs to retain performance with significantly shorter prompts by learning "gist tokens."

How It Works

The approach introduces "gist tokens" that are trained to summarize prompt context. During inference, these gist tokens replace the original prompt, allowing the model to process a much shorter input while preserving the essential information. This is achieved through specialized attention masking during training and a novel inference-time caching mechanism.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Requires specific versions: Transformers commit fb366b9a, DeepSpeed 0.8.3.
  • LLaMA-7B experiments require base model weights in a llama-7b directory.
  • Weights & Biases account needed for training.
  • See Hugging Face for checkpoints.

Highlighted Details

  • Gist compression currently supports batch_size = 1 for LLaMA-7B.
  • Checkpoints are provided as weight diffs for LLaMA-7B.
  • Includes scripts for applying weight diffs and performing inference with gist caching.
  • Training supports configurable gist tokens and masking conditions (gist, pos_control, neg_control).
  • Multi-GPU training with DeepSpeed is supported for larger models.
  • Benchmarking functionality is included to measure performance.

Maintenance & Community

  • No explicit community links (Discord/Slack) are provided in the README.
  • The project is associated with the paper "Learning to Compress Prompts with Gist Tokens" (arXiv:2304.08467).

Licensing & Compatibility

  • Codebase: Apache 2.0.
  • Data: Mixture of Self-Instruct (Apache 2.0) and Stanford Alpaca (CC BY-NC 4.0).
  • Commercial use may be restricted by the CC BY-NC 4.0 license on the Alpaca data.

Limitations & Caveats

  • Gist caching is limited to batch_size = 1 for LLaMA-7B, requiring modifications for larger batches.
  • The current implementation of gist caching may not yield significant speedups due to Python overhead; lower-level implementations are suggested for optimal performance.
  • Reproducibility of training results is sensitive to specific DeepSpeed versions.
Health Check
Last commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

HALOs by ContextualAI

0.3%
873
Library for aligning LLMs using human-aware loss functions
created 1 year ago
updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.2%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

Medusa by FasterDecoding

0.3%
3k
Framework for accelerating LLM generation using multiple decoding heads
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

codellama by meta-llama

0.0%
16k
Inference code for CodeLlama models
created 1 year ago
updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.