Discover and explore top open-source AI tools and projects—updated daily.
TreeAI-LabLLM inference acceleration through comprehensive KV cache management survey
Top 99.0% on SourcePulse
This repository serves as a comprehensive, regularly updated survey of research papers focused on KV Cache Management techniques for Large Language Model (LLM) acceleration. It targets researchers, engineers, and power users involved in LLM development and optimization, providing a structured overview of academic advancements and their associated code implementations to facilitate informed adoption decisions.
How It Works
The project functions as a curated bibliography, systematically categorizing and listing research papers that address KV Cache Management. It employs a detailed taxonomy encompassing token-level, model-level, and system-level optimizations, including sub-categories like KV Cache Selection, Budget Allocation, Quantization, Low-rank Decomposition, Attention Grouping, Architecture Alteration, and various System-level optimizations. Each entry links to the research paper and, where available, its code repository.
Quick Start & Requirements
This repository is a curated survey of research papers and does not contain runnable code for direct installation or execution.
Highlighted Details
Maintenance & Community
Contributions of new papers or modifications are welcomed via email to haoyang-comp.li@polyu.edu.hk or by opening an issue. The survey is updated regularly.
Licensing & Compatibility
No specific software license is mentioned in the provided README content.
Limitations & Caveats
As a survey, its coverage is dependent on the ongoing research landscape and community contributions. Detailed information on datasets and benchmarks is deferred to an external paper.
4 months ago
Inactive
ai-dynamo