gisting  by jayelm

Research paper implementation for prompt compression via learned "gist" tokens

Created 2 years ago
296 stars

Top 89.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code and data for "Learning to Compress Prompts with Gist Tokens," a method for reducing prompt length in large language models. It's targeted at researchers and practitioners looking to optimize LLM inference efficiency and cost. The core benefit is enabling LLMs to retain performance with significantly shorter prompts by learning "gist tokens."

How It Works

The approach introduces "gist tokens" that are trained to summarize prompt context. During inference, these gist tokens replace the original prompt, allowing the model to process a much shorter input while preserving the essential information. This is achieved through specialized attention masking during training and a novel inference-time caching mechanism.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Requires specific versions: Transformers commit fb366b9a, DeepSpeed 0.8.3.
  • LLaMA-7B experiments require base model weights in a llama-7b directory.
  • Weights & Biases account needed for training.
  • See Hugging Face for checkpoints.

Highlighted Details

  • Gist compression currently supports batch_size = 1 for LLaMA-7B.
  • Checkpoints are provided as weight diffs for LLaMA-7B.
  • Includes scripts for applying weight diffs and performing inference with gist caching.
  • Training supports configurable gist tokens and masking conditions (gist, pos_control, neg_control).
  • Multi-GPU training with DeepSpeed is supported for larger models.
  • Benchmarking functionality is included to measure performance.

Maintenance & Community

  • No explicit community links (Discord/Slack) are provided in the README.
  • The project is associated with the paper "Learning to Compress Prompts with Gist Tokens" (arXiv:2304.08467).

Licensing & Compatibility

  • Codebase: Apache 2.0.
  • Data: Mixture of Self-Instruct (Apache 2.0) and Stanford Alpaca (CC BY-NC 4.0).
  • Commercial use may be restricted by the CC BY-NC 4.0 license on the Alpaca data.

Limitations & Caveats

  • Gist caching is limited to batch_size = 1 for LLaMA-7B, requiring modifications for larger batches.
  • The current implementation of gist caching may not yield significant speedups due to Python overhead; lower-level implementations are suggested for optimal performance.
  • Reproducibility of training results is sensitive to specific DeepSpeed versions.
Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
14 more.

SWE-bench by SWE-bench

0.6%
4k
Benchmark for evaluating LLMs on real-world GitHub issues
Created 2 years ago
Updated 3 days ago
Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

repomix by yamadashy

0.6%
20k
CLI tool to pack codebases into AI-friendly formats for LLMs
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.