Clevrr-Computer by Clevrr-AI

Automation agent for precise system actions

Created 1 year ago

309 stars

Top 87.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Rodrigo Nader

Cofounder of Langflow

Harrison Chase

Founder of LangChain

Project Summary

This project provides an open-source implementation of an AI agent designed to automate basic computer tasks using PyAutoGUI for direct system interaction. It targets users seeking to automate repetitive desktop actions, offering precise control over mouse and keyboard inputs, window management, and screen capture.

How It Works

The agent operates as a multi-modal system, continuously capturing screenshots to interpret the screen's visual state. It leverages a chain-of-thought process to break down tasks, using a get_screen_info tool that captures screenshots and identifies screen coordinates. A multi-modal LLM analyzes this visual information to guide the agent. Actions are executed via a PythonREPLAst tool, which interfaces with PyAutoGUI for mouse, keyboard, and window manipulation.

Quick Start & Requirements

Install: git clone https://github.com/Clevrr-AI/Clevrr-Computer.git followed by pip install -r requirements.txt.
Prerequisites: Azure OpenAI or Google Gemini API keys are required, configured via a .env file.
Usage: Run with python main.py. Models can be specified with --model openai or --model gemini. Floating UI can be disabled with --float-ui 0.
Documentation: https://github.com/Clevrr-AI/Clevrr-Computer

Highlighted Details

Implements Anthropic's Computer Use concept for AI-driven desktop automation.
Utilizes PyAutoGUI for precise mouse, keyboard, and window control.
Supports both Azure OpenAI and Google Gemini models for screen analysis.
Features a floating UI for persistent on-screen interaction.

Maintenance & Community

Contact: yurvaj@getclevrr.com. Contributions are welcomed via pull requests.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

This is a beta feature with significant risks, especially when interacting with the internet. The system may follow instructions within web content or images that override user commands, posing a prompt injection risk. Precautions like using virtual machines and limiting data access are strongly recommended.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days