Prompt compression toolkit for LLM inference efficiency
Top 95.8% on sourcepulse
PCToolkit is a unified, plug-and-play toolkit for prompt compression in Large Language Models (LLMs). It offers researchers and developers a standardized framework to experiment with, evaluate, and integrate various state-of-the-art prompt compression techniques, aiming to improve inference efficiency and reduce computational costs. The toolkit supports multiple compression methods, diverse datasets, and evaluation metrics, facilitating reproducible research and practical application.
How It Works
PCToolkit employs a modular design, separating functionalities into Compressor, Dataset, Metric, and Runner modules. This architecture allows for easy integration of new compression algorithms, datasets, and evaluation metrics. The toolkit provides a unified interface to five distinct compressors: Selective Context, LLMLingua, LongLLMLingua, SCRL, and Keep it Simple. It supports evaluation across various NLP tasks, including reconstruction, summarization, and question answering, using 11 datasets and over 5 metrics.
Quick Start & Requirements
pip install -r requirements.txt
after cloning the repository./models
folder. Huggingface tokens and OpenAI API keys are needed for certain functionalities.Highlighted Details
Maintenance & Community
The project is associated with research from authors Li, Yucheng et al. and Jiang, Huiqiang et al. Further details can be found in the linked paper and technical report.
Licensing & Compatibility
The repository is licensed under the MIT license. This license generally permits commercial use and integration into closed-source projects.
Limitations & Caveats
Model weights need to be downloaded manually, and API keys for services like OpenAI must be configured. The README indicates that modifications to metrics might be necessary, especially for the LongBench dataset.
5 months ago
Inactive