Discover and explore top open-source AI tools and projects—updated daily.
google-researchAgent self-evolution via reasoning memory and scaling
Top 98.5% on SourcePulse
ReasoningBank introduces a memory mechanism for AI agents that learns from successful and failed trajectories, storing reasoning to enhance evolution. It targets researchers in software engineering (SWE-Bench) and web-browsing (WebArena), offering experience-driven memory as a new dimension for scaling agent systems.
How It Works
The core innovation is ReasoningBank, a memory formulation capturing reasoning from agent experiences. It enables "memory-aware test-time scaling," exploiting the synergy between memory and scaling strategies. This establishes experience-driven memory as a distinct scaling dimension for improved agent performance and adaptability.
Quick Start & Requirements
Installation requires pip install -r requirements.txt. LLM configuration involves setting environment variables for OpenAI API keys or Google Cloud authentication for Gemini/Claude. WebArena needs browsergym installation, Docker setup, and data processing. SWE-Bench requires pip install -e . in its third_party directory. Execution uses run.sh, with scaling experiments managed by pipeline_scaling.py and induce_scaling.py. Specific configurations for model, output, website, and memory mode are mandatory.
Highlighted Details
webarena harness for improved robustness and evaluation.Maintenance & Community
Code is adopted from Agent-workflow-memory, webarena, and mini-swe-agent. The README provides no details on active contributors, sponsorships, community channels, or a roadmap.
Licensing & Compatibility
The repository's license is not explicitly stated. It is noted that this is a demonstration-purpose project, not officially supported by Google, and not intended for production environments.
Limitations & Caveats
This project is a demonstration, not production-ready or officially supported. A key limitation is the unspecified software license, potentially impacting its usability and compatibility, especially for commercial applications.
1 week ago
Inactive