AI-Employe by vignshwarar

Browser automation via GPT-4 Vision

Created 2 years ago

587 stars

Top 55.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Meng Zhang

Cofounder of TabbyML

Project Summary

This project provides a browser automation tool that leverages GPT-4 Vision to interpret user actions and generate automation scripts. It targets developers and power users seeking to create complex browser workflows through intuitive, human-like instruction, aiming to simplify and enhance web automation tasks.

How It Works

The core innovation addresses element selection by indexing the entire DOM in MeiliSearch. GPT-4 Vision generates commands (e.g., "click this text"), which are then used to query the MeiliSearch index for the corresponding element ID. This approach aims for greater reliability than methods relying solely on visual coordinates or raw HTML. For workflow adherence, it employs an "Actions Augmented Generation" technique, embedding recorded DOM element changes from user actions within prompts to keep GPT focused on the task.

Quick Start & Requirements

Install: Follow setup instructions involving Firebase project creation, service account key configuration (firebaseAdmin/cert/dev.json or prod.json), .env file setup, npm install, npm run db:deploy, and npm run dev (development) or npm run build & npm run start (production).
Prerequisites: Node.js, Rust, Postgres, MeiliSearch, Firebase account.
Output: Browser extension built in ./client/extension/build.
Docs: Firebase Auth Setup

Highlighted Details

Utilizes GPT-4 Vision for natural language-driven browser automation.
Employs MeiliSearch for robust DOM element indexing and retrieval.
Implements "Actions Augmented Generation" to maintain GPT workflow context.
Supports creating and executing recorded user workflows.

Maintenance & Community

The project is maintained by vignshwarar. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The project is in active development, with features like scrolling, opening new tabs, and loop support still on the roadmap. Handling icons and duplicate text elements are noted as ongoing challenges.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days