PaperGuru-Benchmark  by PaperGuru-AI

State-of-the-art memory for long-horizon LLM agents

Created 2 weeks ago

New!

404 stars

Top 71.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

PaperGuru introduces Lifecycle-Aware Memory (LAM), a novel primitive addressing the critical gap in long-term memory for long-horizon LLM agents. It offers state-of-the-art performance on rigorous benchmarks like PaperBench and SurveyBench, providing a foundational component for advanced agentic systems.

How It Works

The system employs a Capital Chunk Memory (CCM) architecture, separating memory into a bounded routing surface (chunk heads) and an unbounded, lazily accessed content surface. A central capital chunk indexes heads, enabling traversal over a temporal artifact graph that unifies structural and historical-causality edges. Queries are processed via a route-first, expand-second, distill-last pipeline, generating provenance-grounded evidence cards. This design jointly satisfies four LAM axioms—versioned content, structural multi-hop relevance, bounded query cost, and provenance-grounded composition—outperforming traditional flat retrieval and ad-hoc memory solutions.

Quick Start & Requirements

The repository focuses on benchmark results and reproducibility. To reproduce figures, install matplotlib, numpy, and pillow (python3 -m pip install matplotlib numpy pillow). The repository size is approximately 350 MB. No specific hardware (e.g., GPU, CUDA) or Python version is mandated for the core concept, though a Python environment is necessary for figure generation. Links to the full paper and benchmark details are provided.

Highlighted Details

  • Achieves state-of-the-art results: 66.05% on PaperBench (vs. 35.74% baseline) and 94.66% on SurveyBench (vs. 80.60% baseline).
  • PaperBench performance includes 20 out of 23 papers exceeding the human-expert bar (41%) and a +30.21% absolute mean lift.
  • SurveyBench demonstrates a +14.06% content average lift and generates composite artifacts more than twice as rich as baselines.
  • Supported by 10 peer-reviewed acceptances since Q4 2025 at top-tier venues including FSE, ICML, TOSEM, AEI, and ICoGB.

Maintenance & Community

Developed by researchers for researchers, the project highlights its academic track record. A WeChat QR code is provided for community engagement, but no other public community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

PaperGuru is distributed under the permissive MIT License. Reproductions within PaperBench inherit the licenses of their original papers, requiring user verification for redistribution. Generated surveys from SurveyBench are available for citation and quotation with attribution. This licensing is generally compatible with commercial use, provided individual PaperBench submission licenses are respected.

Limitations & Caveats

The README emphasizes achievements and does not explicitly detail limitations. The primary focus is on research benchmarks, and users must verify the licenses of individual PaperBench reproduction submissions, which may impose additional restrictions beyond the MIT license of PaperGuru itself.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
404 stars in the last 19 days

Explore Similar Projects

Feedback? Help us improve.