OpenAdapt  by OpenAdaptAI

AI-first process automation tool using multimodal models

created 2 years ago
1,339 stars

Top 30.6% on sourcepulse

GitHubView on GitHub
Project Summary

OpenAdapt provides an open-source framework for AI-first process automation, enabling Large Multimodal Models (LMMs) to interact with desktop and web GUIs. It targets developers and researchers looking to automate repetitive GUI tasks by leveraging AI, offering an alternative to traditional RPA tools.

How It Works

OpenAdapt records user interactions, including screenshots and input events. It converts this data into a tokenized format, allowing transformer models to generate synthetic inputs for replaying actions. The system is model-agnostic and emphasizes learning from human demonstrations to create auto-generated prompts, grounding agents in existing processes to mitigate hallucinations and improve task completion.

Quick Start & Requirements

  • Installation: Scripted installation available for Windows (PowerShell) and macOS (bash script). Manual setup requires Python 3.10, Git, Tesseract (for OCR), and nvm. Installation via Poetry is also supported.
  • Prerequisites: Python 3.10, Git, Tesseract, nvm. macOS users need to configure system permissions for Accessibility.
  • Usage: Start the system tray and dashboard with python -m openadapt.entrypoint. Record actions with python -m openadapt.record "description". Visualize with python -m openadapt.visualize or run the dashboard with python -m openadapt.app.dashboard.run. Replay actions using various strategies like python -m openadapt.replay NaiveReplayStrategy.
  • Browser Integration: Requires a Chrome extension and setting RECORD_BROWSER_EVENTS to true.
  • Docs: GitBook Documentation available.

Highlighted Details

  • AI-first process automation using LMMs for GUI interaction.
  • Model-agnostic design with auto-prompting based on human demonstrations.
  • Supports various replay strategies, including visual and stateful approaches.
  • Features PII/PHI scrubbing, decentralized data distribution, and performance monitoring.

Maintenance & Community

  • Active development with open contract positions for developers.
  • Community engagement via Discord and a Request for Comments process.
  • GitHub issues are tracked for bug reports and feature requests.

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

Recordings are currently recommended to be short (under a minute) due to potential memory intensity and an open issue regarding memory leaks. Touchpad/trackpad gesture support is limited to cursor movement and clicks.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
91 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Toran Bruce Richards Toran Bruce Richards(Founder of AutoGPT), and
2 more.

OS-Copilot by OS-Copilot

0.1%
2k
OS agent for automating daily tasks
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
9 more.

codex by openai

0.9%
32k
Coding agent CLI tool for terminal-based chat-driven development
created 3 months ago
updated 13 hours ago
Feedback? Help us improve.