Agent-E  by EmergenceAI

Agent automation system for web browser interaction

Created 1 year ago
1,169 stars

Top 33.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Agent-E is an AI-powered system designed for automating tasks within web browsers, built upon the AG2 agent framework. It targets users who need to automate repetitive web interactions, offering a natural language interface to control browser actions like form filling, e-commerce navigation, and data extraction. The system aims to provide a safer and more predictable alternative to LLMs directly generating browser automation code by using a curated library of "skills" that map to human-understandable browser actions.

How It Works

Agent-E utilizes a multi-agent architecture, primarily featuring a User Proxy agent and a Browser Navigation agent. The core innovation lies in its "Skills Library," which contains atomic, natural language-described actions for browser interaction (e.g., click, enter_text, openurl). Instead of LLMs writing arbitrary code, Agent-E's agents select and orchestrate these pre-defined skills. To manage the complexity of web pages, Agent-E employs "DOM Distillation," which refines the HTML DOM into a more relevant and digestible JSON snapshot, often leveraging the Accessibility Tree and injecting custom mmid attributes to guide LLM interactions.

Quick Start & Requirements

  • Installation: Run ./install.sh (macOS/Linux) or .\win_install.ps1 (Windows) from the project root. Pass -p to install Playwright.
  • Prerequisites: Python 3.11+ (recommended), uv package manager (installable via script or pip install uv), LLM API key (e.g., OpenAI), and optionally Google Chrome or Playwright drivers.
  • Configuration: Set environment variables in .env (e.g., AUTOGEN_MODEL_NAME, AUTOGEN_MODEL_API_KEY).
  • Running: Execute python -m ae.main.
  • Docs: https://github.com/EmergenceAI/Agent-E

Highlighted Details

  • Leverages AG2 (formerly AutoGen) framework for agent orchestration.
  • Employs DOM Distillation for efficient LLM interaction with web pages.
  • Supports both Playwright and local Chrome browser instances.
  • Offers a FastAPI endpoint for programmatic control and integration.
  • Can be configured to use open-source LLMs via LiteLLM and Ollama.

Maintenance & Community

  • Active development is indicated by the TODO list.
  • Community discussion is available via Discord.
  • The project is associated with EmergenceAI.

Licensing & Compatibility

  • The README does not explicitly state a license. This requires clarification for commercial use or closed-source integration.

Limitations & Caveats

The project is still under active development, with several planned features like robust dropdown handling, task caching, and multi-tab support still pending. Compatibility with open-source LLMs is noted as "not thoroughly tested." The current implementation supports only single-tab usage, and PDF text handling is limited.

Health Check
Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Travis Fischer Travis Fischer(Founder of Agentic), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
4 more.

open-operator by browserbase

0.4%
2k
Template for building web agents using Browserbase and Stagehand
Created 7 months ago
Updated 3 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
15 more.

stagehand by browserbase

0.5%
17k
AI browser automation framework for production
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.