Agent-E  by EmergenceAI

Agent automation system for web browser interaction

created 1 year ago
1,153 stars

Top 34.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Agent-E is an AI-powered system designed for automating tasks within web browsers, built upon the AG2 agent framework. It targets users who need to automate repetitive web interactions, offering a natural language interface to control browser actions like form filling, e-commerce navigation, and data extraction. The system aims to provide a safer and more predictable alternative to LLMs directly generating browser automation code by using a curated library of "skills" that map to human-understandable browser actions.

How It Works

Agent-E utilizes a multi-agent architecture, primarily featuring a User Proxy agent and a Browser Navigation agent. The core innovation lies in its "Skills Library," which contains atomic, natural language-described actions for browser interaction (e.g., click, enter_text, openurl). Instead of LLMs writing arbitrary code, Agent-E's agents select and orchestrate these pre-defined skills. To manage the complexity of web pages, Agent-E employs "DOM Distillation," which refines the HTML DOM into a more relevant and digestible JSON snapshot, often leveraging the Accessibility Tree and injecting custom mmid attributes to guide LLM interactions.

Quick Start & Requirements

  • Installation: Run ./install.sh (macOS/Linux) or .\win_install.ps1 (Windows) from the project root. Pass -p to install Playwright.
  • Prerequisites: Python 3.11+ (recommended), uv package manager (installable via script or pip install uv), LLM API key (e.g., OpenAI), and optionally Google Chrome or Playwright drivers.
  • Configuration: Set environment variables in .env (e.g., AUTOGEN_MODEL_NAME, AUTOGEN_MODEL_API_KEY).
  • Running: Execute python -m ae.main.
  • Docs: https://github.com/EmergenceAI/Agent-E

Highlighted Details

  • Leverages AG2 (formerly AutoGen) framework for agent orchestration.
  • Employs DOM Distillation for efficient LLM interaction with web pages.
  • Supports both Playwright and local Chrome browser instances.
  • Offers a FastAPI endpoint for programmatic control and integration.
  • Can be configured to use open-source LLMs via LiteLLM and Ollama.

Maintenance & Community

  • Active development is indicated by the TODO list.
  • Community discussion is available via Discord.
  • The project is associated with EmergenceAI.

Licensing & Compatibility

  • The README does not explicitly state a license. This requires clarification for commercial use or closed-source integration.

Limitations & Caveats

The project is still under active development, with several planned features like robust dropdown handling, task caching, and multi-tab support still pending. Compatibility with open-source LLMs is noted as "not thoroughly tested." The current implementation supports only single-tab usage, and PDF text handling is limited.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
55 stars in the last 90 days

Explore Similar Projects

Starred by Wes McKinney Wes McKinney(Author of Pandas), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
9 more.

autogen by microsoft

0.6%
48k
Agentic framework for multi-agent AI applications
created 1 year ago
updated 23 hours ago
Feedback? Help us improve.