Clevrr-Computer  by Clevrr-AI

Automation agent for precise system actions

created 9 months ago
285 stars

Top 92.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides an open-source implementation of an AI agent designed to automate basic computer tasks using PyAutoGUI for direct system interaction. It targets users seeking to automate repetitive desktop actions, offering precise control over mouse and keyboard inputs, window management, and screen capture.

How It Works

The agent operates as a multi-modal system, continuously capturing screenshots to interpret the screen's visual state. It leverages a chain-of-thought process to break down tasks, using a get_screen_info tool that captures screenshots and identifies screen coordinates. A multi-modal LLM analyzes this visual information to guide the agent. Actions are executed via a PythonREPLAst tool, which interfaces with PyAutoGUI for mouse, keyboard, and window manipulation.

Quick Start & Requirements

  • Install: git clone https://github.com/Clevrr-AI/Clevrr-Computer.git followed by pip install -r requirements.txt.
  • Prerequisites: Azure OpenAI or Google Gemini API keys are required, configured via a .env file.
  • Usage: Run with python main.py. Models can be specified with --model openai or --model gemini. Floating UI can be disabled with --float-ui 0.
  • Documentation: https://github.com/Clevrr-AI/Clevrr-Computer

Highlighted Details

  • Implements Anthropic's Computer Use concept for AI-driven desktop automation.
  • Utilizes PyAutoGUI for precise mouse, keyboard, and window control.
  • Supports both Azure OpenAI and Google Gemini models for screen analysis.
  • Features a floating UI for persistent on-screen interaction.

Maintenance & Community

Contact: yurvaj@getclevrr.com. Contributions are welcomed via pull requests.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

This is a beta feature with significant risks, especially when interacting with the internet. The system may follow instructions within web content or images that override user commands, posing a prompt injection risk. Precautions like using virtual machines and limiting data access are strongly recommended.

Health Check
Last commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Toran Bruce Richards Toran Bruce Richards(Founder of AutoGPT), and
2 more.

OS-Copilot by OS-Copilot

0.1%
2k
OS agent for automating daily tasks
created 1 year ago
updated 10 months ago
Feedback? Help us improve.