web-eval-agent by refreshdotdev

MCP server for autonomous web app evaluation

Created 11 months ago

1,234 stars

Top 31.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Project Summary

This project provides an MCP server that autonomously evaluates web applications by executing and debugging them within a code editor environment. It's designed for developers seeking to automate web app testing and debugging, offering a faster and more integrated approach to quality assurance.

How It Works

The agent leverages BrowserUse to navigate web applications, capturing network traffic and console errors. It intelligently filters network requests and returns them to the context window. The core innovation lies in its autonomous debugging capability, where a Cursor agent calls the web QA agent MCP server to verify the functionality of code it has written, enabling end-to-end testing directly within the development workflow.

Quick Start & Requirements

macOS/Linux: Requires Homebrew, npm, and jq. Installation is via a provided script: curl -LSf https://operative.sh/install.sh | bash.
Windows (Cline): Requires an API key from operative.sh, uv (via curl -LsSf https://astral.sh/uv/install.sh | sh), and Playwright (uvx --from git+https://github.com/Operative-Sh/web-eval-agent.git playwright install).
Setup involves downloading an installer script and potentially restarting the IDE.

Highlighted Details

Browser navigation is reportedly 2x faster with the operative backend.
Captures and intelligently filters network traffic and console logs.
Enables autonomous end-to-end testing of code generated by agents.
Provides detailed reports including agent steps, console logs, network requests, and a chronological timeline.

Maintenance & Community

The project mentions recent updates (4/29) for agent overlay and browser controls. Users are encouraged to open issues for any problems encountered.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The README notes initial Playwright issues that were addressed on 4/14. Users are directed to open issues for any further problems.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)