OpenAdapt  by OpenAdaptAI

AI-first process automation tool using multimodal models

Created 2 years ago
1,375 stars

Top 29.3% on SourcePulse

GitHubView on GitHub
Project Summary

OpenAdapt provides an open-source framework for AI-first process automation, enabling Large Multimodal Models (LMMs) to interact with desktop and web GUIs. It targets developers and researchers looking to automate repetitive GUI tasks by leveraging AI, offering an alternative to traditional RPA tools.

How It Works

OpenAdapt records user interactions, including screenshots and input events. It converts this data into a tokenized format, allowing transformer models to generate synthetic inputs for replaying actions. The system is model-agnostic and emphasizes learning from human demonstrations to create auto-generated prompts, grounding agents in existing processes to mitigate hallucinations and improve task completion.

Quick Start & Requirements

  • Installation: Scripted installation available for Windows (PowerShell) and macOS (bash script). Manual setup requires Python 3.10, Git, Tesseract (for OCR), and nvm. Installation via Poetry is also supported.
  • Prerequisites: Python 3.10, Git, Tesseract, nvm. macOS users need to configure system permissions for Accessibility.
  • Usage: Start the system tray and dashboard with python -m openadapt.entrypoint. Record actions with python -m openadapt.record "description". Visualize with python -m openadapt.visualize or run the dashboard with python -m openadapt.app.dashboard.run. Replay actions using various strategies like python -m openadapt.replay NaiveReplayStrategy.
  • Browser Integration: Requires a Chrome extension and setting RECORD_BROWSER_EVENTS to true.
  • Docs: GitBook Documentation available.

Highlighted Details

  • AI-first process automation using LMMs for GUI interaction.
  • Model-agnostic design with auto-prompting based on human demonstrations.
  • Supports various replay strategies, including visual and stateful approaches.
  • Features PII/PHI scrubbing, decentralized data distribution, and performance monitoring.

Maintenance & Community

  • Active development with open contract positions for developers.
  • Community engagement via Discord and a Request for Comments process.
  • GitHub issues are tracked for bug reports and feature requests.

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

Recordings are currently recommended to be short (under a minute) due to potential memory intensity and an open issue regarding memory leaks. Touchpad/trackpad gesture support is limited to cursor movement and clicks.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 30 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

TagUI by aisingapore

0.1%
6k
Free RPA tool for automating repetitive tasks on websites, desktop apps, and command lines
Created 8 years ago
Updated 6 months ago
Starred by Magnus Müller Magnus Müller(Cofounder of Browser Use), Phil Wang Phil Wang(Prolific Research Paper Implementer), and
30 more.

gemini-cli by google-gemini

1.3%
76k
AI agent for terminal workflows
Created 5 months ago
Updated 18 hours ago
Feedback? Help us improve.