Semantic cache for LLM queries, integrated with LangChain and LlamaIndex
Top 6.9% on sourcepulse
GPTCache provides a semantic caching layer for Large Language Models (LLMs) to reduce API costs and improve response times. It's designed for developers building LLM-powered applications who face high operational expenses and latency issues. The library offers significant performance gains by storing and retrieving similar query results, effectively bypassing repeated LLM calls.
How It Works
GPTCache employs semantic caching, moving beyond simple exact-match retrieval. It converts user queries into embeddings using various embedding models and stores these in a vector database. When a new query arrives, GPTCache generates its embedding and performs a similarity search in the vector store to find semantically related past queries and their cached responses. This approach significantly increases cache hit rates compared to traditional methods.
Quick Start & Requirements
pip install gptcache
git clone -b dev https://github.com/zilliztech/GPTCache.git && cd GPTCache && pip install -r requirements.txt && python setup.py install
Highlighted Details
Maintenance & Community
The project is actively developed by Zilliz. Contributions are welcomed, with a clear contribution guide available.
Licensing & Compatibility
GPTCache is released under the Apache-2.0 license, permitting commercial use and integration with closed-source applications.
Limitations & Caveats
The project is under "swift development," meaning APIs may change. Support for new LLM APIs and models is no longer being added directly; users are encouraged to use the generic get
and set
APIs. Some module combinations might not be compatible, and a sanity check feature is in development.
3 weeks ago
1 day