SeeAct  by OSU-NLP-Group

Web agent for autonomous task completion on websites

created 1 year ago
766 stars

Top 46.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

SeeAct is a generalist web agent system designed for autonomously executing tasks across any website, with a primary focus on Large Multimodal Models (LMMs) like GPT-4V. It provides a robust codebase for running web agents on live websites and an innovative framework leveraging LMMs for task completion, targeting researchers and developers building automated web interaction tools.

How It Works

SeeAct utilizes a two-component architecture: a Playwright-based tool for interfacing with live websites and an LMM-driven framework. The Playwright tool acts as an intermediary, translating agent actions into browser events and tunneling browser inputs to the agent. This approach allows for direct interaction with live web pages, enabling evaluation and demonstration of web agents in realistic environments.

Quick Start & Requirements

  • Install via pip: pip install seeact
  • Requires Python 3.11.
  • Setup Playwright and install browser kernels: playwright install
  • OpenAI API key is required for OpenAI models. Gemini API key is required for Gemini models.
  • See official documentation for detailed usage and setup.

Highlighted Details

  • Supports multiple LMMs including OpenAI (GPT-4V, GPT-4 Turbo, GPT-4o), Google Gemini, and Ollama (LLaVA).
  • Offers a "Crawler Mode" for autonomous exploration of websites.
  • Includes a "Demo Mode" for interactive task execution and a "Configuration File" for customization.
  • Provides the Multimodal-Mind2Web dataset for training and evaluation.

Maintenance & Community

  • Active development with recent updates including a Chrome Extension, EMNLP'24 acceptance, and crawler mode.
  • Contact information for contributors and the OSU NLP group is available.
  • Twitter Updates for project news.

Licensing & Compatibility

  • Code licensed under OPEN RAIL-S.
  • Data licensed under OPEN RAIL-D.
  • Model weights and parameters licensed under OPEN RAIL-M.
  • The license may have restrictions on commercial use or redistribution.

Limitations & Caveats

The system is research/experimental and requires cautious monitoring during operation. It explicitly states it does not support direct login actions and advises against using it for tasks requiring account access due to safety and legal risks.

Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
27 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.