Awesome-LLM-KV-Cache  by Zefan-Cai

Curated list of LLM KV cache research papers with code

created 1 year ago
342 stars

Top 81.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated list of research papers and associated code focused on optimizing Large Language Model (LLM) Key-Value (KV) cache efficiency. It targets researchers and engineers working on LLM inference, providing a structured overview of techniques for KV cache compression, merging, budget allocation, quantization, and decomposition, aiming to improve inference speed and reduce memory footprint.

How It Works

The project categorizes recent advancements in LLM KV cache optimization, presenting papers with links to their PDFs and code repositories. It uses a star rating system to highlight influential or highly-rated works within each category, such as KV cache compression, merging, and quantization. The organization facilitates a quick understanding of the research landscape and practical implementations for efficient LLM inference.

Quick Start & Requirements

This repository is a curated list and does not require installation or execution. It serves as a reference guide to external papers and code.

Highlighted Details

  • Comprehensive categorization of KV cache optimization techniques including compression, merging, budget allocation, quantization, and decomposition.
  • Includes recent papers (up to July 2024) with direct links to research PDFs and code implementations.
  • Features a rating system (⭐️ to ⭐️⭐️⭐️) to indicate the significance or quality of the research.
  • Covers trending inference topics and system architectures related to KV cache management.

Maintenance & Community

The repository is maintained by Zefan-Cai and welcomes community contributions via pull requests. It is open-source and encourages users to star the repository.

Licensing & Compatibility

The repository is licensed under the GNU General Public License v3.0. This is a strong copyleft license, meaning derivative works must also be open-sourced under the same license.

Limitations & Caveats

This is a curated list of external resources; it does not provide any executable code or direct tooling for KV cache optimization itself. The quality and availability of linked code repositories may vary.

Health Check
Last commit

5 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
59 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

cookbook by EleutherAI

0.1%
809
Deep learning resource for practical model work
created 1 year ago
updated 4 days ago
Feedback? Help us improve.