Chrome-GPT  by richardyc

AutoGPT agent for Chrome control

created 2 years ago
1,737 stars

Top 25.2% on sourcepulse

GitHubView on GitHub
Project Summary

Chrome-GPT is an experimental AutoGPT agent that leverages Langchain and Selenium to automate user interactions within a Chrome browser. It's designed for users who want to automate web-based tasks, such as form filling, data extraction, and navigating complex websites, by providing a natural language interface for controlling browser actions.

How It Works

The agent utilizes Selenium to programmatically control a Chrome browser instance, enabling actions like scrolling, clicking elements, and inputting text into web forms. It integrates with Langchain for managing agent logic, memory, and interacting with large language models (LLMs) like GPT-3.5 and GPT-4 to interpret tasks and generate browser commands. This approach allows for complex, multi-step web automation driven by natural language prompts.

Quick Start & Requirements

  • Install via poetry install.
  • Requires Python >3.8 and OpenAI API keys (set as OPENAI_API_KEY environment variable).
  • Run with python -m chromegpt -t "{your request}".
  • GPT-4 usage: python -m chromegpt -v -a auto-gpt -m gpt-4 -t "{your request}".
  • Docker setup available via docker-compose up.

Highlighted Details

  • Supports Google searches and interactive Chrome actions (scrolling, clicking, form input).
  • Integrates memory management for long-term and short-term context.
  • Offers multiple agent types: Zero-shot, BabyAGI, and Auto-GPT.
  • Chrome plugin support is in progress.

Maintenance & Community

The project is experimental and primarily maintained by its creator, richardyc. Community engagement and development status are not explicitly detailed beyond the GitHub repository.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The agent has limited web crawling capabilities, with occasional failures in identifying buttons and input fields. Response times are slow, ranging from 1-10 seconds per action. There are known issues with Langchain agents parsing GPT outputs, suggesting alternative agent types if problems arise. The project is marked as experimental, implying potential instability and unexpected behavior.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.