Peekaboo  by steipete

macOS GUI automation and screenshot analysis tool

Created 3 months ago
623 stars

Top 53.0% on SourcePulse

GitHubView on GitHub
Project Summary

Peekaboo is a macOS CLI and optional MCP server designed for AI agents to capture screenshots and automate GUI interactions. It offers lightning-fast screen capture, AI-powered image analysis, and advanced GUI automation capabilities, making it suitable for developers, researchers, and power users seeking to integrate visual context and control into their AI workflows.

How It Works

Peekaboo employs a service-based architecture with a shared core for screen capture, UI automation, and window management. It leverages macOS's native APIs for efficient screen recording and UI element identification. The tool supports local AI models via Ollama for privacy-conscious analysis and integrates with AI assistants through the Model Context Protocol (MCP).

Quick Start & Requirements

  • Installation: Homebrew (brew install peekaboo), direct download, npm (npm install -g @steipete/peekaboo-mcp), or build from source.
  • Requirements: macOS 14.0+ (Sonoma or later), Screen Recording permission, optional Accessibility permission.
  • Setup: Minimal setup for CLI usage; MCP server setup requires configuration within AI assistant settings (e.g., Claude Desktop, Cursor).
  • Docs: https://github.com/steipete/peekaboo

Highlighted Details

  • GUI Automation (v3): Natural language control for clicking, typing, scrolling, and menu interactions.
  • AI Agent: Understands and executes complex tasks using OpenAI, Anthropic, or Grok models.
  • Multi-Screen Support: Manages and interacts with windows across multiple displays.
  • MCP Integration: Acts as both an MCP server and client, supporting external tools like BrowserMCP.

Maintenance & Community

  • Author: Peter Steinberger (@steipete).
  • Community: Primarily driven by the author; no explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

The project requires macOS 14.0+ and specific system permissions (Screen Recording, optional Accessibility) for full functionality. While local AI models are supported, advanced agent features rely on external API keys.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
36 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
2 more.

UI-TARS-desktop by bytedance

1.1%
19k
GUI agent app for computer control via natural language
Created 8 months ago
Updated 15 hours ago
Feedback? Help us improve.