Awesome-LLM-KV-Cache by Zefan-Cai

Curated list of LLM KV cache research papers with code

Created 1 year ago

405 stars

Top 71.8% on SourcePulse

Project Summary

This repository is a curated list of research papers and associated code focused on optimizing Large Language Model (LLM) Key-Value (KV) cache efficiency. It targets researchers and engineers working on LLM inference, providing a structured overview of techniques for KV cache compression, merging, budget allocation, quantization, and decomposition, aiming to improve inference speed and reduce memory footprint.

How It Works

The project categorizes recent advancements in LLM KV cache optimization, presenting papers with links to their PDFs and code repositories. It uses a star rating system to highlight influential or highly-rated works within each category, such as KV cache compression, merging, and quantization. The organization facilitates a quick understanding of the research landscape and practical implementations for efficient LLM inference.

Quick Start & Requirements

This repository is a curated list and does not require installation or execution. It serves as a reference guide to external papers and code.

Highlighted Details

Comprehensive categorization of KV cache optimization techniques including compression, merging, budget allocation, quantization, and decomposition.
Includes recent papers (up to July 2024) with direct links to research PDFs and code implementations.
Features a rating system (⭐️ to ⭐️⭐️⭐️) to indicate the significance or quality of the research.
Covers trending inference topics and system architectures related to KV cache management.

Maintenance & Community

The repository is maintained by Zefan-Cai and welcomes community contributions via pull requests. It is open-source and encourages users to star the repository.

Licensing & Compatibility

The repository is licensed under the GNU General Public License v3.0. This is a strong copyleft license, meaning derivative works must also be open-sourced under the same license.

Limitations & Caveats

This is a curated list of external resources; it does not provide any executable code or direct tooling for KV cache optimization itself. The quality and availability of linked code repositories may vary.

Awesome-LLM-KV-Cache by Zefan-Cai

Explore Similar Projects

Awesome-KV-Cache-Management by TreeAI-Lab

Awesome-KV-Cache-Compression by October2001

SnapKV by FasterDecoding

InfiniStore by bytedance

C2C by thu-nics

KVQuant by SqueezeAILab

KIVI by jy-yuan

omniserve by mit-han-lab

H2O by FMInference

kvpress by NVIDIA

KVCache-Factory by Zefan-Cai

LMCache by LMCache