Discover and explore top open-source AI tools and projects—updated daily.
Evaluating long-term conversational memory in LLM agents
Top 92.0% on SourcePulse
This repository introduces LoCoMo, a benchmark dataset and evaluation framework for assessing the very long-term conversational memory of LLM agents. It targets researchers and developers, enabling rigorous testing of agent recall, coherence, and RAG capabilities over extended dialogs to understand long-term context maintenance.
How It Works
LoCoMo features 10 annotated, very long conversations structured into sessions with timestamps, speakers, and dialog turns (including image URLs/metadata). The framework provides scripts for generating synthetic conversations using LLM agents with defined personas and for evaluating LLMs on Question Answering (QA) and Event Summarization. Generated 'observations' and 'session summaries' serve as RAG databases.
Quick Start & Requirements
Configuration is handled via scripts/env.sh
. Conversation generation uses bash scripts/generate_conversations.sh
, supporting custom personas or MSC dataset sampling. Evaluation scripts (bash scripts/evaluate_gpts.sh
, etc.) cover various LLM providers. Re-generating RAG data uses bash scripts/generate_observations.sh
and bash scripts/generate_session_summaries.sh
. API keys may be necessary.
Highlighted Details
Maintenance & Community
The provided README lacks specific details on community channels, project roadmaps, or notable contributors and sponsorships.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. This omission is a potential adoption blocker, especially for commercial use or integration into closed-source projects.
Limitations & Caveats
Images are not included; only web URLs, BLIP captions, and search queries are provided. The current dataset is a subset of 10 conversations, selected for evaluation cost-effectiveness. Event summarization and multimodal dialog generation evaluation features are marked as "Coming soon."
1 year ago
Inactive