OmniParser  by microsoft

Screen parsing tool for vision-based GUI agents

created 10 months ago
22,990 stars

Top 1.8% on sourcepulse

GitHubView on GitHub
Project Summary

OmniParser provides a method for parsing UI screenshots into structured elements, enabling vision-based GUI agents like GPT-4V to accurately ground actions in specific interface regions. It targets developers building agents for computer use and offers improved action generation and interaction capabilities.

How It Works

OmniParser employs a two-stage approach: first, an interactive region detection model identifies UI elements, and second, an icon functional description model captions these elements. This allows for fine-grained parsing, including small icons and interactability prediction, which is crucial for precise agent control.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Requires Python 3.12 and downloading V2 model weights from Hugging Face.
  • Official demo available at HuggingFace Space Demo.

Highlighted Details

  • Achieves state-of-the-art results on the Screen Spot Pro grounding benchmark.
  • V1.5 adds fine-grained icon detection and interactability prediction.
  • OmniTool allows controlling Windows 11 VMs with OmniParser and various LLMs.
  • Supports local trajectory logging for agent training data pipelines.

Maintenance & Community

  • Active development with V2 checkpoints released in Feb 2025.
  • Project page and V2 blog post linked in the README.

Licensing & Compatibility

  • Model checkpoints are dual-licensed: icon_detect is AGPL (inherited from YOLO), while icon_caption models are MIT.
  • AGPL license may impose restrictions on commercial or closed-source use.

Limitations & Caveats

The AGPL license for the detection model may restrict its use in proprietary software. Documentation for new features like multi-agent orchestration is still in progress.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
2
Issues (30d)
8
Star History
1,239 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.