Webwright by microsoft

Browser agent framework for state-of-the-art web task completion

Created 3 months ago

5,828 stars

Top 8.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

Webwright provides a framework for transforming Large Language Models (LLMs) into state-of-the-art browser agents capable of handling complex, long-horizon web tasks. It is designed for engineers and researchers seeking to enhance LLM capabilities in web automation. The primary benefit is enabling more robust, efficient, and reusable web interactions by leveraging an LLM's coding abilities within a programmable browser environment.

How It Works

Webwright adopts a "code-as-action" paradigm, fundamentally separating the LLM agent from the browser. Instead of the agent predicting discrete UI actions, it writes Python scripts using Playwright to directly control browser sessions. The browser is treated as a disposable environment, and the persistent state resides in the local workspace—comprising code, logs, and screenshots. This approach allows agents to compose complex workflows, utilize loops and abstractions, and iterate on solutions much like a human developer, leading to more reliable and adaptable web automation.

Quick Start & Requirements

Primary Install: pip install -e .
Prerequisites: Python 3.10+, Playwright (with playwright install chromium), and an API key for a supported backend (OpenAI, Anthropic, or OpenRouter).
Run Command: python -m webwright.run.cli -c base.yaml -c model_openai.yaml -t "Task instruction" --start-url <url> --task-id <id> -o <output_dir>
Documentation: Project Page: https://microsoft.github.io/Webwright

Highlighted Details

Achieves state-of-the-art results on challenging benchmarks like Online-Mind2Web and Odysseys, significantly outperforming prior methods.
The "code-as-action" approach enables efficient composition of complex workflows and reusable scripts, contrasting with step-by-step or coordinate-based interaction models.
Features a Task2UI mode that renders task completion results into easily viewable and reusable HTML-based web applications.
Offers pluggable model backends and integrates seamlessly as a plugin/skill for popular LLM frameworks including Claude Code, OpenAI Codex, OpenClaw, and Hermes Agent.
Designed with a minimalist architecture, featuring a core agent loop of approximately 450 lines of Python code.

Maintenance & Community

The project has seen recent updates (May 2026) introducing new features and integrations, indicating active development. It supports multiple LLM frameworks, suggesting a focus on broad adoption within the AI agent ecosystem. No specific community channels (like Discord or Slack) are listed in the README.

Licensing & Compatibility

The provided README does not explicitly state a software license. This omission makes it unclear whether the project is intended for open-source use, modification, or commercial application. Users should seek clarification on licensing before adoption.

Limitations & Caveats

The most significant limitation is the absence of a clear license, creating uncertainty regarding usage rights, particularly for commercial purposes. While designed to be minimal, users may need to invest time in understanding its specific architecture and integrating it with their chosen LLM backend. The effectiveness of Webwright is also dependent on the underlying LLM's ability to generate correct and effective Playwright scripts.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

172 stars in the last 30 days