Toolkit-for-Prompt-Compression  by 3DAgentWorld

Prompt compression toolkit for LLM inference efficiency

Created 1 year ago
271 stars

Top 95.0% on SourcePulse

GitHubView on GitHub
Project Summary

PCToolkit is a unified, plug-and-play toolkit for prompt compression in Large Language Models (LLMs). It offers researchers and developers a standardized framework to experiment with, evaluate, and integrate various state-of-the-art prompt compression techniques, aiming to improve inference efficiency and reduce computational costs. The toolkit supports multiple compression methods, diverse datasets, and evaluation metrics, facilitating reproducible research and practical application.

How It Works

PCToolkit employs a modular design, separating functionalities into Compressor, Dataset, Metric, and Runner modules. This architecture allows for easy integration of new compression algorithms, datasets, and evaluation metrics. The toolkit provides a unified interface to five distinct compressors: Selective Context, LLMLingua, LongLLMLingua, SCRL, and Keep it Simple. It supports evaluation across various NLP tasks, including reconstruction, summarization, and question answering, using 11 datasets and over 5 metrics.

Quick Start & Requirements

Highlighted Details

  • Supports 5 distinct prompt compression methods (Selective Context, LLMLingua, LongLLMLingua, SCRL, Keep it Simple).
  • Integrates 11 datasets and 5+ evaluation metrics for comprehensive benchmarking.
  • Modular design allows for easy addition of new compressors, datasets, and metrics.
  • Evaluated across a wide range of NLP tasks including reconstruction, summarization, mathematical problem-solving, and code completion.

Maintenance & Community

The project is associated with research from authors Li, Yucheng et al. and Jiang, Huiqiang et al. Further details can be found in the linked paper and technical report.

Licensing & Compatibility

The repository is licensed under the MIT license. This license generally permits commercial use and integration into closed-source projects.

Limitations & Caveats

Model weights need to be downloaded manually, and API keys for services like OpenAI must be configured. The README indicates that modifications to metrics might be necessary, especially for the LongBench dataset.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
3 more.

prompt-lookup-decoding by apoorvumang

0.2%
566
Decoding method for faster LLM generation
Created 1 year ago
Updated 1 year ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
4 more.

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.