AgentLab  by ServiceNow

Open-source framework for web agent development, testing, and benchmarking

created 1 year ago
371 stars

Top 77.4% on sourcepulse

GitHubView on GitHub
Project Summary

AgentLab is an open-source framework for developing, testing, and benchmarking web agents, targeting researchers and developers in the AI agent space. It provides a scalable and reproducible environment to accelerate research by offering building blocks for agent creation, unified LLM API integration, and support for various benchmarks like WebArena and WorkArena.

How It Works

AgentLab leverages BrowserGym for web interaction and task execution, enabling agents to navigate and act within web environments. It utilizes Ray for large-scale parallel experiment execution, allowing for efficient testing of multiple agents across numerous tasks and seeds. The framework supports a unified LLM API, abstracting interactions with providers like OpenAI, Azure, and OpenRouter, and includes features for reproducibility and result analysis.

Quick Start & Requirements

  • Install: pip install agentlab
  • Prerequisites: Python 3.11 or 3.12, Playwright (playwright install).
  • Environment Variables: OPENAI_API_KEY, OPENROUTER_API_KEY, AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AGENTLAB_EXP_ROOT.
  • Setup: Requires preparing specific benchmarks as per their instructions.
  • Links: BrowserGym, Demo.

Highlighted Details

  • Supports parallel agent experiments via Ray.
  • Unified LLM API for OpenAI, Azure, OpenRouter, and self-hosted models.
  • Preferred framework for benchmarks like WebArena and VisualWebArena.
  • Includes reproducibility features and an analysis tool (AgentXray).

Maintenance & Community

  • Developed by ServiceNow.
  • Active GitHub Actions for code formatting and tests.
  • Links to GitHub stars and PyPI downloads are provided.

Licensing & Compatibility

  • License: MIT (as indicated by badge, though a link to Apache 2.0 is also present; clarification recommended).
  • Compatible with commercial use under MIT license terms.

Limitations & Caveats

AgentLab is presented as a research framework, not a consumer product, and should be used with caution. Benchmarks like WebArena and VisualWebArena have a ~5-minute instance reset time per agent evaluation, and task dependencies can limit parallelism; WorkArena is suggested for smoother parallel experiences. Gradio for AgentXray is noted as potentially unstable.

Health Check
Last commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
20
Issues (30d)
3
Star History
56 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Robert Stojnic Robert Stojnic(Creator of Papers with Code).

Agent-S by simular-ai

1.2%
6k
Agentic framework for autonomous computer interaction
created 9 months ago
updated 20 hours ago
Feedback? Help us improve.