LLMLingua  by microsoft

Prompt compression for accelerated LLM inference

created 2 years ago
5,308 stars

Top 9.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides LLMLingua, LongLLMLingua, and LLMLingua-2, a suite of tools for prompt compression to accelerate LLM inference and improve long-context understanding. Targeting developers and researchers working with LLMs, these tools offer significant cost savings and performance enhancements by reducing token usage with minimal impact on output quality.

How It Works

LLMLingua employs a compact, pre-trained language model to identify and remove non-essential tokens from prompts, achieving up to 20x compression. LongLLMLingua specifically addresses the "lost in the middle" issue in long contexts by reordering and compressing information, improving RAG performance. LLMLingua-2 utilizes data distillation from larger models to create a task-agnostic compressor, offering faster performance and better out-of-domain handling.

Quick Start & Requirements

  • Install via pip: pip install llmlingua
  • Usage examples for LLMLingua, LongLLMLingua, and LLMLingua-2 are provided in the README.
  • Supports various models, including microsoft/phi-2 and quantized models like TheBloke/Llama-2-7b-Chat-GPTQ (requiring <8GB GPU memory).
  • Official documentation and demos are available.

Highlighted Details

  • Achieves up to 20x prompt compression with minimal performance loss.
  • Enhances RAG performance by up to 21.4% with LongLLMLingua.
  • LLMLingua-2 offers 3x-6x speed improvement over the original LLMLingua.
  • Integrations available for Prompt flow, LangChain, and LlamaIndex.

Maintenance & Community

  • Developed by Microsoft.
  • Active development with recent releases (SCBench, RetrievalAttention, MInference).
  • Contributions are welcomed via a Contributor License Agreement (CLA).
  • Follows the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README text. Further investigation into the repository's license file is recommended for commercial use or closed-source linking.

Limitations & Caveats

  • The README does not explicitly state the license, which could be a blocker for commercial adoption.
  • While aiming for minimal performance loss, specific use cases may require validation.
Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
280 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

codellama by meta-llama

0.1%
16k
Inference code for CodeLlama models
created 1 year ago
updated 11 months ago
Feedback? Help us improve.