Automation agent for precise system actions
Top 92.8% on sourcepulse
This project provides an open-source implementation of an AI agent designed to automate basic computer tasks using PyAutoGUI for direct system interaction. It targets users seeking to automate repetitive desktop actions, offering precise control over mouse and keyboard inputs, window management, and screen capture.
How It Works
The agent operates as a multi-modal system, continuously capturing screenshots to interpret the screen's visual state. It leverages a chain-of-thought process to break down tasks, using a get_screen_info
tool that captures screenshots and identifies screen coordinates. A multi-modal LLM analyzes this visual information to guide the agent. Actions are executed via a PythonREPLAst
tool, which interfaces with PyAutoGUI for mouse, keyboard, and window manipulation.
Quick Start & Requirements
git clone https://github.com/Clevrr-AI/Clevrr-Computer.git
followed by pip install -r requirements.txt
..env
file.python main.py
. Models can be specified with --model openai
or --model gemini
. Floating UI can be disabled with --float-ui 0
.Highlighted Details
Maintenance & Community
Contact: yurvaj@getclevrr.com. Contributions are welcomed via pull requests.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
This is a beta feature with significant risks, especially when interacting with the internet. The system may follow instructions within web content or images that override user commands, posing a prompt injection risk. Precautions like using virtual machines and limiting data access are strongly recommended.
9 months ago
Inactive