OpenAdapt by OpenAdaptAI

AI-first process automation tool using multimodal models

Created 2 years ago

1,466 stars

Top 27.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

OpenAdapt provides an open-source framework for AI-first process automation, enabling Large Multimodal Models (LMMs) to interact with desktop and web GUIs. It targets developers and researchers looking to automate repetitive GUI tasks by leveraging AI, offering an alternative to traditional RPA tools.

How It Works

OpenAdapt records user interactions, including screenshots and input events. It converts this data into a tokenized format, allowing transformer models to generate synthetic inputs for replaying actions. The system is model-agnostic and emphasizes learning from human demonstrations to create auto-generated prompts, grounding agents in existing processes to mitigate hallucinations and improve task completion.

Quick Start & Requirements

Installation: Scripted installation available for Windows (PowerShell) and macOS (bash script). Manual setup requires Python 3.10, Git, Tesseract (for OCR), and nvm. Installation via Poetry is also supported.
Prerequisites: Python 3.10, Git, Tesseract, nvm. macOS users need to configure system permissions for Accessibility.
Usage: Start the system tray and dashboard with python -m openadapt.entrypoint. Record actions with python -m openadapt.record "description". Visualize with python -m openadapt.visualize or run the dashboard with python -m openadapt.app.dashboard.run. Replay actions using various strategies like python -m openadapt.replay NaiveReplayStrategy.
Browser Integration: Requires a Chrome extension and setting RECORD_BROWSER_EVENTS to true.
Docs: GitBook Documentation available.

Highlighted Details

AI-first process automation using LMMs for GUI interaction.
Model-agnostic design with auto-prompting based on human demonstrations.
Supports various replay strategies, including visual and stateful approaches.
Features PII/PHI scrubbing, decentralized data distribution, and performance monitoring.

Maintenance & Community

Active development with open contract positions for developers.
Community engagement via Discord and a Request for Comments process.
GitHub issues are tracked for bug reports and feature requests.

Licensing & Compatibility

MIT License.
Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

Recordings are currently recommended to be short (under a minute) due to potential memory intensity and an open issue regarding memory leaks. Touchpad/trackpad gesture support is limited to cursor movement and clicks.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

29 stars in the last 30 days