Agent-E  by EmergenceAI

Agent automation system for web browser interaction

Created 1 year ago
1,187 stars

Top 32.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Agent-E is an AI-powered system designed for automating tasks within web browsers, built upon the AG2 agent framework. It targets users who need to automate repetitive web interactions, offering a natural language interface to control browser actions like form filling, e-commerce navigation, and data extraction. The system aims to provide a safer and more predictable alternative to LLMs directly generating browser automation code by using a curated library of "skills" that map to human-understandable browser actions.

How It Works

Agent-E utilizes a multi-agent architecture, primarily featuring a User Proxy agent and a Browser Navigation agent. The core innovation lies in its "Skills Library," which contains atomic, natural language-described actions for browser interaction (e.g., click, enter_text, openurl). Instead of LLMs writing arbitrary code, Agent-E's agents select and orchestrate these pre-defined skills. To manage the complexity of web pages, Agent-E employs "DOM Distillation," which refines the HTML DOM into a more relevant and digestible JSON snapshot, often leveraging the Accessibility Tree and injecting custom mmid attributes to guide LLM interactions.

Quick Start & Requirements

  • Installation: Run ./install.sh (macOS/Linux) or .\win_install.ps1 (Windows) from the project root. Pass -p to install Playwright.
  • Prerequisites: Python 3.11+ (recommended), uv package manager (installable via script or pip install uv), LLM API key (e.g., OpenAI), and optionally Google Chrome or Playwright drivers.
  • Configuration: Set environment variables in .env (e.g., AUTOGEN_MODEL_NAME, AUTOGEN_MODEL_API_KEY).
  • Running: Execute python -m ae.main.
  • Docs: https://github.com/EmergenceAI/Agent-E

Highlighted Details

  • Leverages AG2 (formerly AutoGen) framework for agent orchestration.
  • Employs DOM Distillation for efficient LLM interaction with web pages.
  • Supports both Playwright and local Chrome browser instances.
  • Offers a FastAPI endpoint for programmatic control and integration.
  • Can be configured to use open-source LLMs via LiteLLM and Ollama.

Maintenance & Community

  • Active development is indicated by the TODO list.
  • Community discussion is available via Discord.
  • The project is associated with EmergenceAI.

Licensing & Compatibility

  • The README does not explicitly state a license. This requires clarification for commercial use or closed-source integration.

Limitations & Caveats

The project is still under active development, with several planned features like robust dropdown handling, task caching, and multi-tab support still pending. Compatibility with open-source LLMs is noted as "not thoroughly tested." The current implementation supports only single-tab usage, and PDF text handling is limited.

Health Check
Last Commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Travis Fischer Travis Fischer(Founder of Agentic), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

open-operator by browserbase

0.1%
2k
Template for building web agents using Browserbase and Stagehand
Created 9 months ago
Updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
15 more.

stagehand by browserbase

0.6%
19k
AI browser automation framework for production
Created 1 year ago
Updated 11 hours ago
Feedback? Help us improve.