CAG  by hhhuang

CAG: RAG alternative using LLM context windows, research paper

created 7 months ago
1,350 stars

Top 30.4% on sourcepulse

GitHubView on GitHub
Project Summary

Cache-Augmented Generation (CAG) offers a retrieval-free alternative to Retrieval-Augmented Generation (RAG) for enhancing LLM responses with external knowledge. It targets users seeking reduced latency, improved reliability, and simplified system design compared to traditional RAG, by preloading relevant data into the LLM's context window and caching its KV-cache.

How It Works

CAG bypasses real-time retrieval by preloading all necessary external knowledge into the LLM's extended context window. During inference, the model utilizes its cached KV-cache, enabling direct generation without the overhead of a separate retrieval step. This approach aims to achieve comparable or superior results to RAG with a simpler architecture and lower latency.

Quick Start & Requirements

  • Install: pip install -r ./requirements.txt
  • Prerequisites: Download datasets (sh ./downloads.sh), create .env file with API keys.
  • Dependencies: Python, meta-llama/Llama-3.1-8B-Instruct model, squad and hotpotqa datasets, bertscore for similarity. GPU recommended for performance.
  • Docker: docker build -t my-cag-app . and docker run --gpus all -it --rm my-cag-app (or CPU variant).
  • Usage: python kvcache.py for CAG, python rag.py for RAG. See README for detailed parameter examples.
  • Docs: https://arxiv.org/abs/2412.15605

Highlighted Details

  • CAG eliminates retrieval latency and minimizes retrieval errors.
  • Achieves comparable or superior results to RAG with a simpler design.
  • Supports Llama-3.1-8B-Instruct and bertscore for similarity.
  • Provides scripts for both CAG (kvcache.py) and RAG (rag.py) experiments.

Maintenance & Community

The project is associated with research presented at ACM Web Conference 2025. Acknowledgments mention support from Taiwan's National Science and Technology Council (NSTC) and Academia Sinica.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

CAG is limited by the LLM's context window size, making it less suitable for extremely large datasets. Performance may degrade with very long contexts, though ongoing LLM advancements are expected to mitigate this.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
98 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.