Awesome-KV-Cache-Management  by TreeAI-Lab

LLM inference acceleration through comprehensive KV cache management survey

Created 11 months ago
254 stars

Top 99.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, regularly updated survey of research papers focused on KV Cache Management techniques for Large Language Model (LLM) acceleration. It targets researchers, engineers, and power users involved in LLM development and optimization, providing a structured overview of academic advancements and their associated code implementations to facilitate informed adoption decisions.

How It Works

The project functions as a curated bibliography, systematically categorizing and listing research papers that address KV Cache Management. It employs a detailed taxonomy encompassing token-level, model-level, and system-level optimizations, including sub-categories like KV Cache Selection, Budget Allocation, Quantization, Low-rank Decomposition, Attention Grouping, Architecture Alteration, and various System-level optimizations. Each entry links to the research paper and, where available, its code repository.

Quick Start & Requirements

This repository is a curated survey of research papers and does not contain runnable code for direct installation or execution.

Highlighted Details

  • Recent news includes paper acceptances for TMLR 2025 and ACL 2025, the release of a numerical benchmark (NumericBench), and an LLM inference acceleration system (LoopServe).
  • Features a detailed taxonomy of KV cache management strategies, spanning token-level, model-level, and system-level optimizations.
  • Provides links to numerous research papers and their corresponding code implementations.

Maintenance & Community

Contributions of new papers or modifications are welcomed via email to haoyang-comp.li@polyu.edu.hk or by opening an issue. The survey is updated regularly.

Licensing & Compatibility

No specific software license is mentioned in the provided README content.

Limitations & Caveats

As a survey, its coverage is dependent on the ongoing research landscape and community contributions. Detailed information on datasets and benchmarks is deferred to an external paper.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
22 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.