ModelCache  by codefuse-ai

LLM semantic cache for reducing response time via cached query-result pairs

Created 1 year ago
951 stars

Top 38.6% on SourcePulse

GitHubView on GitHub
Project Summary

A semantic caching system for large language models (LLMs) that reduces response times and inference costs by caching query-result pairs. It is designed for businesses and research institutions looking to optimize LLM service performance and scalability.

How It Works

ModelCache employs a modular architecture including adapter, embedding, similarity, and data management. The embedding module converts text into vector representations for similarity matching. The adapter module orchestrates business logic, integrating these components. Data is managed via scalar and vector storage, with recent updates including Redis Search for faster embedding retrieval and integration with various embedding frameworks like 'llmEmb', 'ONNX', and 'timm'.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.8+, MySQL, Milvus (for standard service). Demo uses SQLite and FAISS. Requires downloading embedding model bin files from Hugging Face.
  • Demo: python flask4modelcache_demo.py
  • Standard Service: Configure milvus_config.ini and mysql_config.ini, then run python flask4modelcache.py.
  • Docker: docker-compose up (requires docker network create modelcache first).
  • Docs: Hugging Face for models.

Highlighted Details

  • Integrates with LLM products as a lightweight, Redis-like cache, compatible with all LLM services.
  • Supports local embedding model loading and multiple embedding layers.
  • Implements data isolation for development/production environments and multi-tenancy.
  • Differentiates between long and short text for similarity assessment.
  • Optimized Milvus consistency level to "Session" for performance.

Maintenance & Community

This project acknowledges inspiration from GPTCache. Contributions are welcomed via issues, suggestions, code, or documentation.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The project is actively under development with a "Todo List" including support for FastAPI, a visual interface, further inference optimization, and additional storage backends like MongoDB and Elasticsearch. Compatibility with specific inference engines like FasterTransformer is planned.

Health Check
Last Commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Travis Fischer Travis Fischer(Founder of Agentic), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
1 more.

semantic-cache by upstash

0%
285
Semantic cache for natural language tasks
Created 1 year ago
Updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

GPTCache by zilliztech

0.2%
8k
Semantic cache for LLM queries, integrated with LangChain and LlamaIndex
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.