AgentLab by ServiceNow

Open-source framework for web agent development, testing, and benchmarking

Created 1 year ago

520 stars

Top 60.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Vincent Weisser

Cofounder of Prime Intellect

Project Summary

AgentLab is an open-source framework for developing, testing, and benchmarking web agents, targeting researchers and developers in the AI agent space. It provides a scalable and reproducible environment to accelerate research by offering building blocks for agent creation, unified LLM API integration, and support for various benchmarks like WebArena and WorkArena.

How It Works

AgentLab leverages BrowserGym for web interaction and task execution, enabling agents to navigate and act within web environments. It utilizes Ray for large-scale parallel experiment execution, allowing for efficient testing of multiple agents across numerous tasks and seeds. The framework supports a unified LLM API, abstracting interactions with providers like OpenAI, Azure, and OpenRouter, and includes features for reproducibility and result analysis.

Quick Start & Requirements

Install: pip install agentlab
Prerequisites: Python 3.11 or 3.12, Playwright (playwright install).
Environment Variables: OPENAI_API_KEY, OPENROUTER_API_KEY, AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AGENTLAB_EXP_ROOT.
Setup: Requires preparing specific benchmarks as per their instructions.
Links: BrowserGym, Demo.

Highlighted Details

Supports parallel agent experiments via Ray.
Unified LLM API for OpenAI, Azure, OpenRouter, and self-hosted models.
Preferred framework for benchmarks like WebArena and VisualWebArena.
Includes reproducibility features and an analysis tool (AgentXray).

Maintenance & Community

Developed by ServiceNow.
Active GitHub Actions for code formatting and tests.
Links to GitHub stars and PyPI downloads are provided.

Licensing & Compatibility

License: MIT (as indicated by badge, though a link to Apache 2.0 is also present; clarification recommended).
Compatible with commercial use under MIT license terms.

Limitations & Caveats

AgentLab is presented as a research framework, not a consumer product, and should be used with caution. Benchmarks like WebArena and VisualWebArena have a ~5-minute instance reset time per agent evaluation, and task dependencies can limit parallelism; WorkArena is suggested for smoother parallel experiences. Gradio for AgentXray is noted as potentially unstable.

AgentLab by ServiceNow

Explore Similar Projects

Gentopia by Gentopia-AI

agentsilex by howl-anderson

AgentCPM by OpenBMB

any-agent by mozilla-ai

WindowsAgentArena by microsoft

agentscope-studio by agentscope-ai

surf.new by steel-dev

AgentGym by WooooDyy

webarena by web-arena-x

lagent by InternLM

agentops by AgentOps-AI

letta by letta-ai