SeeAct by OSU-NLP-Group

Web agent for autonomous task completion on websites

Created 2 years ago

812 stars

Top 43.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

SeeAct is a generalist web agent system designed for autonomously executing tasks across any website, with a primary focus on Large Multimodal Models (LMMs) like GPT-4V. It provides a robust codebase for running web agents on live websites and an innovative framework leveraging LMMs for task completion, targeting researchers and developers building automated web interaction tools.

How It Works

SeeAct utilizes a two-component architecture: a Playwright-based tool for interfacing with live websites and an LMM-driven framework. The Playwright tool acts as an intermediary, translating agent actions into browser events and tunneling browser inputs to the agent. This approach allows for direct interaction with live web pages, enabling evaluation and demonstration of web agents in realistic environments.

Quick Start & Requirements

Install via pip: pip install seeact
Requires Python 3.11.
Setup Playwright and install browser kernels: playwright install
OpenAI API key is required for OpenAI models. Gemini API key is required for Gemini models.
See official documentation for detailed usage and setup.

Highlighted Details

Supports multiple LMMs including OpenAI (GPT-4V, GPT-4 Turbo, GPT-4o), Google Gemini, and Ollama (LLaVA).
Offers a "Crawler Mode" for autonomous exploration of websites.
Includes a "Demo Mode" for interactive task execution and a "Configuration File" for customization.
Provides the Multimodal-Mind2Web dataset for training and evaluation.

Maintenance & Community

Active development with recent updates including a Chrome Extension, EMNLP'24 acceptance, and crawler mode.
Contact information for contributors and the OSU NLP group is available.
Twitter Updates for project news.

Licensing & Compatibility

Code licensed under OPEN RAIL-S.
Data licensed under OPEN RAIL-D.
Model weights and parameters licensed under OPEN RAIL-M.
The license may have restrictions on commercial use or redistribution.

Limitations & Caveats

The system is research/experimental and requires cautious monitoring during operation. It explicitly states it does not support direct login actions and advises against using it for tasks requiring account access due to safety and legal risks.

SeeAct by OSU-NLP-Group

Explore Similar Projects

lmrouter by LMRouter

AI_Proxy_United by unfish

visualwebarena by web-arena-x

sentient by sentient-engineering

browserbee by parsaghaffari

webllama by McGill-NLP

WebVoyager by MinorJerry

lsp-ai by SilasMarvin

webarena by web-arena-x

crawlee by apify

AstrBot by AstrBotDevs

chat-ui by huggingface