WebCanvas by iMeanAI

Web agent framework for online development, training, and evaluation

Created 1 year ago

276 stars

Top 94.1% on SourcePulse

Project Summary

WebCanvas is an open-source framework designed for building, training, and evaluating web agents in dynamic, real-time online environments. It addresses the limitations of static or isolated web agent development by providing a comprehensive suite of tools for realistic interaction and assessment, targeting researchers and developers building sophisticated web-based AI agents.

How It Works

WebCanvas employs a "KEY-NODE" based approach for web trajectory annotation, enabling granular, phase-based assessment of agent performance. It integrates live web environments for realistic feedback, supporting dynamic evaluation functions and offering metrics like USD efficiency. The framework is built with plug-and-play modules for planning, observation, memory, reward, action execution, and evaluation, facilitating easy iteration on LLM-based web agents.

Quick Start & Requirements

Install: conda create -n webcanvas python=3.11, conda activate webcanvas, pip install -r requirements.txt.
Prerequisites: Node.js, Google API Key and Custom Search Engine ID for search actions, Browserbase API Key for cloud browser integration.
Setup: Requires API key configuration and Node.js dependencies.
Docs: How to guide, Data download, Demo video.

Highlighted Details

Supports multiple LLM providers (OpenAI, Claude, Gemini, together.ai) and OpenAI's o1 models.
Introduces a JavaScript event listener-based evaluation system decoupling evaluation from action space.
Offers a "USD efficiency score" to quantify agent cost-effectiveness.
Provides the Mind2Web-Live dataset with 542 tasks and 2439 intermediate states for benchmarking.

Maintenance & Community

Active development with recent releases (v0.0.4 in Dec 2024).
Community channels include GitHub Discussions and Discord.
Paper presented at ICML 2024 and ACL 2024 workshops.

Licensing & Compatibility

The repository does not explicitly state a license in the README.
Open data is available for research use.

Limitations & Caveats

The framework is in early stages (v0.0.4) with several items still in the TODO list, including batch evaluation, captcha solving services, and integration with more benchmark datasets like WebArena. The README notes that experimental environment (e.g., Windows server, US-based servers) can significantly impact agent performance.

WebCanvas by iMeanAI

Explore Similar Projects

skills by browserbase

agisdk by agi-inc

agentic-web by SafeRL-Lab

Agent_Foundation_Models by OPPO-PersonalAI

visualwebarena by web-arena-x

TheAgentCompany by TheAgentCompany

surf.new by steel-dev

AgentGym by WooooDyy

AgentSims by py499372727

BrowserGym by ServiceNow

webarena by web-arena-x

openbrowser by ntegrals