vision-agent  by askui

Python tool for AI-driven desktop, mobile, and HMI automation

created 9 months ago
346 stars

Top 81.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an AI-powered framework for automating desktop and mobile tasks across Windows, macOS, Linux, Android, and iOS. It targets developers and RPA engineers seeking to integrate AI agents for UI automation, offering both step-by-step commands and intent-based instructions for flexible task execution.

How It Works

The framework combines a custom-built "Agent OS" for cross-platform UI interaction (screenshots, mouse control, typing) with various AI models for element recognition and action execution. Users can leverage models like Anthropic's Claude or AskUI's proprietary Prompt-to-Automation (PTA) models, allowing for flexible AI integration and on-premise deployment.

Quick Start & Requirements

  • Install Agent OS: Download OS-specific installers from provided links (Windows, Linux, macOS). Linux users must use XOrg, not Wayland.
  • Install Python package: pip install askui (requires Python >= 3.10).
  • Authentication: Set environment variables for AI model providers (e.g., ANTHROPIC_API_KEY, ASKUI_WORKSPACE_ID, ASKUI_TOKEN).
  • Demo: Test with Hugging Face models via Spaces API (rate-limited).
  • Docs: askui.com

Highlighted Details

  • Supports Windows, Linux, macOS, Android, iOS, and Citrix.
  • Offers in-background automation on Windows.
  • Allows hot-swapping and retraining of AI models.
  • Provides direct access to underlying OS and browser tools.
  • Supports advanced element locating using visual descriptions and AI elements.

Maintenance & Community

  • Active development with a Discord community available via invite link.
  • Telemetry is enabled by default but can be disabled.

Licensing & Compatibility

  • The license is not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The experimental chat feature has numerous known issues, including inability to stop agents, lack of retry options, and focus problems.
  • Response schema extraction is limited to the default askui model.
  • Multi-monitor support requires manual display number selection.
Health Check
Last commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
27
Issues (30d)
0
Star History
49 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

TagUI by aisingapore

0.1%
6k
Free RPA tool for automating repetitive tasks on websites, desktop apps, and command lines
created 8 years ago
updated 5 months ago
Feedback? Help us improve.