Awesome-KV-Cache-Optimization by jjiantong

Survey on efficient LLM serving via KV cache optimization

Created 4 months ago

310 stars

Top 87.1% on SourcePulse

Project Summary

This repository serves as a comprehensive survey of system-aware KV cache optimization techniques for efficient Large Language Model (LLM) serving. It targets researchers and engineers seeking to improve LLM inference performance without modifying model architectures or retraining. The primary benefit is a structured, taxonomy-driven overview of existing methods, their trade-offs, and open research challenges.

How It Works

The survey organizes KV cache optimization strategies into three core behavioral dimensions: Temporal (access/computation timing), Spatial (placement/migration), and Structural (representation/management). This taxonomy facilitates analysis of how these behaviors interact (co-design affinity) and influence key serving objectives like latency, throughput, and memory usage (behavior-objective effects). This systematic approach aims to highlight novel research directions and identify critical open problems in LLM serving efficiency.

Quick Start & Requirements

This repository is a curated list of research papers and does not provide a runnable software tool for direct installation or execution. Contribution instructions are provided for adding new papers via pull requests or issues.

Highlighted Details

Organizes KV cache optimization methods into a novel "system behavior-oriented taxonomy" encompassing Temporal, Spatial, and Structural dimensions.
Provides in-depth analysis of cross-behavior co-design affinities and behavior-objective effects, mapping techniques to serving goals.
Features a comprehensive catalog of papers, often linking to their code implementations where available.
Actively maintained and updated, welcoming community contributions.

Maintenance & Community

The survey and repository are under active development and are updated regularly. Contributions of relevant papers are encouraged via pull requests or issues. The primary citation is provided for the survey paper: Jiang et al., "Towards Efficient Large Language Model Serving: A Survey on System-Aware KV Cache Optimization" [DOI: 10.36227/techrxiv.176046306.66521015/v3].

Licensing & Compatibility

No specific software license is mentioned for the repository itself. The content is a survey of research papers, each with its own licensing implications.

Limitations & Caveats

As a research survey, this repository does not offer a deployable system. Its scope is limited to "system-aware, serving-time, KV-centric optimization methods" that do not require model retraining or architectural changes. The content is continuously evolving due to active development.

Awesome-KV-Cache-Optimization by jjiantong

Explore Similar Projects

Awesome-KV-Cache-Management by TreeAI-Lab

LLM-Calc by RayFernando1337

Awesome_LLM_System-PaperList by galeselee

flex-nano-vllm by changjonathanc

Awesome-KV-Cache-Compression by October2001

Quest by mit-han-lab

omniserve by mit-han-lab

mixture_of_recursions by raymin0223

LLM-Viewer by hahnyuan

InferenceX by SemiAnalysisAI

tiny-llm by skyzh

mini-sglang by sgl-project