Awesome-KV-Cache-Compression by October2001

List of must-read papers on KV cache compression

Created 1 year ago

633 stars

Top 52.4% on SourcePulse

Project Summary

This repository is a curated collection of research papers focused on KV cache compression techniques for Large Language Models (LLMs). It aims to provide a comprehensive overview of methods for optimizing LLM inference efficiency, targeting researchers and engineers working on LLM acceleration and memory optimization.

How It Works

The repository categorizes papers into distinct approaches for KV cache compression, including pruning/evicting tokens, merging KV cache entries across layers, low-rank approximations, quantization, and prompt compression. This structured organization allows users to quickly identify and explore specific optimization strategies and their underlying methodologies.

Quick Start & Requirements

This is a curated list of papers and does not involve code execution.

Highlighted Details

Comprehensive coverage of KV cache compression methods, including pruning, merging, low-rank, quantization, and prompt compression.
Includes links to original research papers (PDFs) for in-depth study.
Features survey papers that provide a high-level overview of the field.
Constantly updated with new research in LLM inference optimization.

Maintenance & Community

The project is actively maintained, with the last commit indicated as recent. It welcomes contributions via Pull Requests.

Licensing & Compatibility

The repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

This repository is a literature collection and does not provide implementations or benchmarks of the discussed methods. Users must refer to individual papers for implementation details and performance evaluations.

Awesome-KV-Cache-Compression by October2001

Explore Similar Projects

Awesome-KV-Cache-Management by TreeAI-Lab

Awesome-LLM-KV-Cache by Zefan-Cai

SnapKV by FasterDecoding

InfiniStore by bytedance

Quest by mit-han-lab

KVQuant by SqueezeAILab

omniserve by mit-han-lab

MInference by microsoft

kvpress by NVIDIA

KVCache-Factory by Zefan-Cai

Awesome-LLM-Inference by xlite-dev

lmdeploy by InternLM