AutoCompressors  by princeton-nlp

Research paper adapting LMs for long context compression

Created 2 years ago
312 stars

Top 86.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Adapting Language Models to Compress Long Contexts," enabling language models to compress extensive context into summary vectors and reason over them. It targets researchers and practitioners working with long-context NLP tasks, offering a method to overcome context length limitations in transformer models.

How It Works

The core innovation is the "AutoCompressor" architecture, which integrates a context compression mechanism directly into the language model. This is achieved by training the model to generate a fixed-size set of "summary vectors" from segments of the input context. These summary vectors are then prepended to subsequent segments as soft prompts, allowing the model to retain and reason over information from much longer contexts than its native architecture would typically support. This approach avoids the quadratic complexity of standard attention mechanisms for long sequences.

Quick Start & Requirements

  • Install: pip install packaging transformers==4.34.0 datasets==2.13.4 accelerate==0.24.1 sentencepiece==0.1.99 flash-attn==2.3.5 wandb and pip install git+https://github.com/Dao-AILab/flash-attention.git#subdirectory=csrc/rotary.
  • Prerequisites: PyTorch 2.1.0+, CUDA 11.8+ (for flash-attn), bfloat16 support, and flash-attn installation with correct CUDA_HOME.
  • Resource Footprint: Requires GPU with sufficient VRAM for Llama-2-7b (e.g., 24GB+ for bfloat16).
  • Links: Hugging Face Models, Paper

Highlighted Details

  • Offers pre-trained AutoCompressors based on Llama-2-7b and OPT-2.7b/1.3b, supporting context lengths up to 30k tokens.
  • Utilizes Flash Attention for reduced memory requirements during training and inference.
  • Supports both explicit generation of summary vectors and implicit, multi-step compression for extremely long inputs.
  • Demonstrates significant performance gains in retaining context information compared to standard models.

Maintenance & Community

The project is associated with Princeton University NLP research. For questions or bugs, users can contact the authors via email or open an issue on GitHub.

Licensing & Compatibility

The repository code is likely under a permissive license (e.g., MIT, Apache 2.0), but the underlying base models (Llama-2, OPT) have their own licenses. Llama-2's license has restrictions on commercial use for very large companies. Compatibility with closed-source linking depends on the base model licenses.

Limitations & Caveats

Flash Attention requires specific CUDA versions and hardware, and its use with use_cache=True during evaluation might be unstable. The project relies on specific versions of libraries, potentially leading to compatibility issues with newer releases.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
3 more.

prompt-lookup-decoding by apoorvumang

0.2%
566
Decoding method for faster LLM generation
Created 1 year ago
Updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

pyctcdecode by kensho-technologies

0%
460
CTC beam search decoder for speech recognition
Created 4 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.