Agent-E by EmergenceAI

Agent automation system for web browser interaction

Created 1 year ago

1,197 stars

Top 32.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

Agent-E is an AI-powered system designed for automating tasks within web browsers, built upon the AG2 agent framework. It targets users who need to automate repetitive web interactions, offering a natural language interface to control browser actions like form filling, e-commerce navigation, and data extraction. The system aims to provide a safer and more predictable alternative to LLMs directly generating browser automation code by using a curated library of "skills" that map to human-understandable browser actions.

How It Works

Agent-E utilizes a multi-agent architecture, primarily featuring a User Proxy agent and a Browser Navigation agent. The core innovation lies in its "Skills Library," which contains atomic, natural language-described actions for browser interaction (e.g., click, enter_text, openurl). Instead of LLMs writing arbitrary code, Agent-E's agents select and orchestrate these pre-defined skills. To manage the complexity of web pages, Agent-E employs "DOM Distillation," which refines the HTML DOM into a more relevant and digestible JSON snapshot, often leveraging the Accessibility Tree and injecting custom mmid attributes to guide LLM interactions.

Quick Start & Requirements

Installation: Run ./install.sh (macOS/Linux) or .\win_install.ps1 (Windows) from the project root. Pass -p to install Playwright.
Prerequisites: Python 3.11+ (recommended), uv package manager (installable via script or pip install uv), LLM API key (e.g., OpenAI), and optionally Google Chrome or Playwright drivers.
Configuration: Set environment variables in .env (e.g., AUTOGEN_MODEL_NAME, AUTOGEN_MODEL_API_KEY).
Running: Execute python -m ae.main.
Docs: https://github.com/EmergenceAI/Agent-E

Highlighted Details

Leverages AG2 (formerly AutoGen) framework for agent orchestration.
Employs DOM Distillation for efficient LLM interaction with web pages.
Supports both Playwright and local Chrome browser instances.
Offers a FastAPI endpoint for programmatic control and integration.
Can be configured to use open-source LLMs via LiteLLM and Ollama.

Maintenance & Community

Active development is indicated by the TODO list.
Community discussion is available via Discord.
The project is associated with EmergenceAI.

Licensing & Compatibility

The README does not explicitly state a license. This requires clarification for commercial use or closed-source integration.

Limitations & Caveats

The project is still under active development, with several planned features like robust dropdown handling, task caching, and multi-tab support still pending. Compatibility with open-source LLMs is noted as "not thoroughly tested." The current implementation supports only single-tab usage, and PDF text handling is limited.

Health Check

Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days