stagehand by browserbase

AI browser automation framework for production

Created 1 year ago

20,214 stars

Top 2.2% on SourcePulse

View on GitHub

18 Experts Love This Project

Will Brown

Research Lead at Prime Intellect

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Luis Capelo

Cofounder of Lightning AI

Søren Bramer Schmidt

Cofounder of Prisma

and 14 more!

Project Summary

Stagehand is a production-ready framework for AI-powered browser automation, designed for developers who need a balance between low-level control and AI-driven flexibility. It allows users to integrate state-of-the-art AI models from OpenAI and Anthropic, enabling them to use natural language for complex navigation while retaining code-based control for predictable tasks.

How It Works

Stagehand bridges the gap between traditional automation tools like Selenium and unpredictable AI agents. It allows developers to selectively use AI for unfamiliar web interactions via the act() function and leverage Playwright directly for known sequences. For more complex, multi-step AI tasks, it integrates "Computer Use" agents. The framework also supports previewing AI actions and caching repeatable steps to optimize token usage and execution time.

Quick Start & Requirements

Install via npx create-browser-app.
Requires an API key for an LLM provider (OpenAI, Anthropic) and Browserbase credentials, configured in a .env file.
Dependencies include Node.js, npm, and Playwright.
See: docs.stagehand.dev and Quickstart Guide.

Highlighted Details

Integrates SOTA computer use models from OpenAI and Anthropic with one line of code.
Allows developers to choose between code (Playwright) and natural language for specific actions.
Features preview and caching for AI actions to save time and tokens.
Built on Playwright for a resilient automation backbone.

Maintenance & Community

Active development with a focus on reliability, speed, and cost.
Contributions are welcomed; reach out on Slack for alignment.
Links: Slack community

Licensing & Compatibility

Licensed under the MIT License.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The framework relies on external LLM API keys and Browserbase credentials, incurring associated costs. While it aims for production readiness, the AI-driven components may still exhibit unpredictability in certain scenarios.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

109

Issues (30d)

Star History

838 stars in the last 30 days