Awesome-KV-Cache-Compression  by October2001

List of must-read papers on KV cache compression

created 1 year ago
496 stars

Top 63.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated collection of research papers focused on KV cache compression techniques for Large Language Models (LLMs). It aims to provide a comprehensive overview of methods for optimizing LLM inference efficiency, targeting researchers and engineers working on LLM acceleration and memory optimization.

How It Works

The repository categorizes papers into distinct approaches for KV cache compression, including pruning/evicting tokens, merging KV cache entries across layers, low-rank approximations, quantization, and prompt compression. This structured organization allows users to quickly identify and explore specific optimization strategies and their underlying methodologies.

Quick Start & Requirements

This is a curated list of papers and does not involve code execution.

Highlighted Details

  • Comprehensive coverage of KV cache compression methods, including pruning, merging, low-rank, quantization, and prompt compression.
  • Includes links to original research papers (PDFs) for in-depth study.
  • Features survey papers that provide a high-level overview of the field.
  • Constantly updated with new research in LLM inference optimization.

Maintenance & Community

The project is actively maintained, with the last commit indicated as recent. It welcomes contributions via Pull Requests.

Licensing & Compatibility

The repository is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

This repository is a literature collection and does not provide implementations or benchmarks of the discussed methods. Users must refer to individual papers for implementation details and performance evaluations.

Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
105 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.