reasoning-bank  by google-research

Agent self-evolution via reasoning memory and scaling

Created 2 months ago
256 stars

Top 98.5% on SourcePulse

GitHubView on GitHub
Project Summary

ReasoningBank introduces a memory mechanism for AI agents that learns from successful and failed trajectories, storing reasoning to enhance evolution. It targets researchers in software engineering (SWE-Bench) and web-browsing (WebArena), offering experience-driven memory as a new dimension for scaling agent systems.

How It Works

The core innovation is ReasoningBank, a memory formulation capturing reasoning from agent experiences. It enables "memory-aware test-time scaling," exploiting the synergy between memory and scaling strategies. This establishes experience-driven memory as a distinct scaling dimension for improved agent performance and adaptability.

Quick Start & Requirements

Installation requires pip install -r requirements.txt. LLM configuration involves setting environment variables for OpenAI API keys or Google Cloud authentication for Gemini/Claude. WebArena needs browsergym installation, Docker setup, and data processing. SWE-Bench requires pip install -e . in its third_party directory. Execution uses run.sh, with scaling experiments managed by pipeline_scaling.py and induce_scaling.py. Specific configurations for model, output, website, and memory mode are mandatory.

Highlighted Details

  • Supports SWE-Bench (software engineering) and WebArena (web-browsing) benchmarks.
  • Vendors a patched webarena harness for improved robustness and evaluation.
  • Introduces "memory-aware test-time scaling" as a novel agent scaling method.

Maintenance & Community

Code is adopted from Agent-workflow-memory, webarena, and mini-swe-agent. The README provides no details on active contributors, sponsorships, community channels, or a roadmap.

Licensing & Compatibility

The repository's license is not explicitly stated. It is noted that this is a demonstration-purpose project, not officially supported by Google, and not intended for production environments.

Limitations & Caveats

This project is a demonstration, not production-ready or officially supported. A key limitation is the unspecified software license, potentially impacting its usability and compatibility, especially for commercial applications.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
231 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.