OmniParser by microsoft

Screen parsing tool for vision-based GUI agents

Created 1 year ago

24,175 stars

Top 1.7% on SourcePulse

View on GitHub

11 Experts Love This Project

Cofounder of Browser Use

Elie Bursztein

Cybersecurity Lead at Google DeepMind

and 7 more!

Project Summary

OmniParser provides a method for parsing UI screenshots into structured elements, enabling vision-based GUI agents like GPT-4V to accurately ground actions in specific interface regions. It targets developers building agents for computer use and offers improved action generation and interaction capabilities.

How It Works

OmniParser employs a two-stage approach: first, an interactive region detection model identifies UI elements, and second, an icon functional description model captions these elements. This allows for fine-grained parsing, including small icons and interactability prediction, which is crucial for precise agent control.

Quick Start & Requirements

Install via pip install -r requirements.txt after cloning the repository.
Requires Python 3.12 and downloading V2 model weights from Hugging Face.
Official demo available at HuggingFace Space Demo.

Highlighted Details

Achieves state-of-the-art results on the Screen Spot Pro grounding benchmark.
V1.5 adds fine-grained icon detection and interactability prediction.
OmniTool allows controlling Windows 11 VMs with OmniParser and various LLMs.
Supports local trajectory logging for agent training data pipelines.

Maintenance & Community

Active development with V2 checkpoints released in Feb 2025.
Project page and V2 blog post linked in the README.

Licensing & Compatibility

Model checkpoints are dual-licensed: icon_detect is AGPL (inherited from YOLO), while icon_caption models are MIT.
AGPL license may impose restrictions on commercial or closed-source use.

Limitations & Caveats

The AGPL license for the detection model may restrict its use in proprietary software. Documentation for new features like multi-agent orchestration is still in progress.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

229 stars in the last 30 days