understudy  by understudy-ai

Automate computer tasks with a teachable desktop agent

Created 1 month ago
412 stars

Top 70.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Understudy is a teachable desktop agent designed to automate routine computer tasks by learning from user demonstrations. It addresses the fragmentation of AI tools by operating across GUI, browser, and shell interfaces within a single local runtime. Targeting power users and engineers, it offers a no-code, no-API approach to task automation, allowing users to simply show it a task once and have it learn, remember, and eventually optimize its execution.

How It Works

The project employs a five-layer progression mirroring human learning: Watch, Imitate, Remember, Optimize, and Anticipate. Understudy operates software natively, learns task intent (not just coordinates) from user demonstrations, crystallizes successful execution paths into memory, and automatically discovers faster routes over time. This approach avoids complex API integrations or workflow builders, focusing on intuitive, human-like learning.

Quick Start & Requirements

Install via npm: npm install -g @understudy-ai/understudy. Run understudy wizard for setup. Key requirements include Node.js >= 20.6. macOS users need Xcode Command Line Tools and Accessibility/Screen Recording permissions. Optional dependencies like Chrome, Playwright, ffmpeg, and tesseract enhance functionality. Official documentation and demos are available via links in the README.

Highlighted Details

  • Unified desktop runtime integrates GUI, browser, shell, and messaging channels.
  • GUI grounding uses a dual-model architecture with high accuracy and HiDPI support.
  • "Teach" feature enables intent-based learning from demonstrations, generating reusable skills.
  • Crystallization loop automatically identifies and publishes recurring workflows as workspace skills.
  • Route optimization prioritizes faster execution paths (API > CLI > Browser > GUI).
  • Development progresses through five layers, with Layers 1-2 implemented, 3-4 partially, and 5 as a long-term vision.

Maintenance & Community

The project maintains an active Discord community. Key areas for contribution include developing GUI backends for Linux/Windows, creating new skill modules, enhancing route discovery, and improving teach functionality.

Licensing & Compatibility

Understudy is released under the MIT license, permitting commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

Native GUI automation and demonstration recording are currently macOS-exclusive; Linux and Windows support is planned. Layers 3-5 are in partial or conceptual stages of development. Full functionality requires granting specific macOS system permissions.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
17
Issues (30d)
1
Star History
320 stars in the last 30 days

Explore Similar Projects

Starred by Yiran Wu Yiran Wu(Coauthor of AutoGen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

OS-Copilot by OS-Copilot

0.1%
2k
OS agent for automating daily tasks
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.