Curated list of LLM KV cache research papers with code
Top 81.9% on sourcepulse
This repository is a curated list of research papers and associated code focused on optimizing Large Language Model (LLM) Key-Value (KV) cache efficiency. It targets researchers and engineers working on LLM inference, providing a structured overview of techniques for KV cache compression, merging, budget allocation, quantization, and decomposition, aiming to improve inference speed and reduce memory footprint.
How It Works
The project categorizes recent advancements in LLM KV cache optimization, presenting papers with links to their PDFs and code repositories. It uses a star rating system to highlight influential or highly-rated works within each category, such as KV cache compression, merging, and quantization. The organization facilitates a quick understanding of the research landscape and practical implementations for efficient LLM inference.
Quick Start & Requirements
This repository is a curated list and does not require installation or execution. It serves as a reference guide to external papers and code.
Highlighted Details
Maintenance & Community
The repository is maintained by Zefan-Cai and welcomes community contributions via pull requests. It is open-source and encourages users to star the repository.
Licensing & Compatibility
The repository is licensed under the GNU General Public License v3.0. This is a strong copyleft license, meaning derivative works must also be open-sourced under the same license.
Limitations & Caveats
This is a curated list of external resources; it does not provide any executable code or direct tooling for KV cache optimization itself. The quality and availability of linked code repositories may vary.
5 months ago
1+ week