Discover and explore top open-source AI tools and projects—updated daily.
supermemoryaiEvaluate conversational memory and RAG systems with a unified benchmarking framework
Top 98.7% on SourcePulse
Summary
MemoryBench is a unified, pluggable benchmarking framework designed to evaluate the performance of conversational memory and Retrieval Augmented Generation (RAG) systems. It targets engineers and researchers needing to rigorously assess LLM context management capabilities across diverse datasets and providers. The framework offers interoperability, allowing users to mix and match benchmarks, memory providers, and LLM judges, facilitating direct, side-by-side comparisons and detailed performance analysis.
How It Works
The core of MemoryBench is a modular pipeline encompassing Ingest, Indexing, Search, Answer, Evaluate, and Report stages. Its pluggable architecture allows seamless integration of custom benchmarks (e.g., LoCoMo, LongMem) and memory providers (e.g., Supermem, Mem0, Zep) without code modification. The system is judge-agnostic, supporting various LLMs (GPT-4o, Claude, Gemini) for evaluation. Key advantages include checkpointed runs for resilience, multi-provider comparison capabilities, and structured reporting with a novel MemScore metric (accuracy/latency/tokens) to capture nuanced performance trade-offs.
Quick Start & Requirements
Installation involves cloning the repository and running bun install. Users must configure API keys for desired providers and judges by copying .env.example to .env.local. A primary command is bun run src/index.ts run -p <provider> -b <benchmark>. Prerequisites include the Bun.js runtime and API access credentials for services like OpenAI, Anthropic, and Google.
Highlighted Details
Maintenance & Community
The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmaps.
Licensing & Compatibility
The project is released under the MIT license, which permits broad usage, including commercial applications and integration into closed-source systems.
Limitations & Caveats
The framework's extensibility relies on user contributions for new providers, benchmarks, or judges. Performance and reliability may vary based on the specific implementations of these pluggable components. Setting up the necessary API keys for various LLM services is a prerequisite for execution. The README does not specify alpha/beta status or known bugs.
1 month ago
Inactive
JinjieNi
kagisearch
mlfoundations