R-KV by Zefan-Cai

KV cache compression for reasoning models

Created 6 months ago

1,151 stars

Top 33.5% on SourcePulse

Project Summary

R-KV addresses the significant memory overhead of KV caches in Large Language Models (LLMs) during long-form reasoning tasks like Chain-of-Thought. It targets researchers and engineers working with reasoning-focused LLMs, offering substantial memory savings and throughput improvements with minimal accuracy loss.

How It Works

R-KV employs a novel redundancy-aware KV cache compression strategy during decoding. It scores newly generated tokens based on both their importance (derived from attention weights) and their non-redundancy (using cosine similarity to identify and prune near-duplicates). A joint selection mechanism then retains the top-k tokens within a budget, balancing memory savings against accuracy. This approach is advantageous as it specifically targets the redundancy inherent in reasoning traces, unlike methods optimized for prompt compression.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Install FlashAttention (recommended): python model = AutoModelForCausalLM.from_pretrained("model_name_or_path", attn_implementation="flash_attention_2")
Build R-KV package: pip install -e .
Run example: bash examples/run.sh or python3 ./run_math.py ...
Evaluation toolkit setup: cd evaluation/latex2sympy && pip install -e . && cd .. && pip install -r requirements.txt
Requires Python and CUDA.

Highlighted Details

Achieves ≈100% accuracy with only 10% of the KV cache.
Demonstrates up to 6.6x throughput increase and 90% memory savings on long CoT generation.
Outperforms baselines like SnapKV by retaining more diverse and important tokens.
Offers a plug-and-play, training-free solution for existing autoregressive LLMs.

Maintenance & Community

The project was released on May 25, 2029. No specific community links (Discord/Slack) or active contributor information are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The citation lists authors from multiple institutions, suggesting potential academic licensing. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not detail specific limitations, unsupported platforms, or known bugs. The project appears to be a recent release with a focus on specific reasoning benchmarks (MATH-500, AIME-24) and DeepSeek-R1 variants.

R-KV by Zefan-Cai

Explore Similar Projects

Awesome-KV-Cache-Management by TreeAI-Lab

VisionZip by dvlab-research

dots.llm1 by rednote-hilab

duo-attention by mit-han-lab

Quest by mit-han-lab

SepLLM by HKUDS

H2O by FMInference

mixture_of_recursions by raymin0223

LLMSpeculativeSampling by feifeibear

Awesome-Efficient-LLM by horseee

MemoRAG by qhjqhj00

CAG by hhhuang