ScreenSpot-Pro-GUI-Grounding by likaixin2000

GUI grounding for professional high-resolution computer interaction

Created 1 year ago

305 stars

Top 87.9% on SourcePulse

Project Summary

ScreenSpot-Pro-GUI-Grounding addresses the challenge of precise GUI grounding for professional, high-resolution computer environments. It introduces the SE-GUI model, offering enhanced accuracy in understanding and interacting with graphical user interfaces. This project is beneficial for researchers and developers in UI automation, multimodal AI, and large language model applications seeking robust GUI comprehension.

How It Works

The project's core innovation is the SE-GUI model, which achieves notable accuracy figures: 47.2% with a 7B parameter version and 35.9% with a 3B version, trained on a dataset of 3,000 open-source samples. It supports diverse interaction paradigms through ScreenSpot-v2-variants, incorporating original instructions, action-based commands, target UI descriptions, and negative instructions, enabling flexible and nuanced GUI control.

Quick Start & Requirements

Setup: Requires setting the OPENAI_API_KEY environment variable.
Evaluation: Evaluation can be initiated using provided shell scripts, such as run_ss_pro.sh.
Prerequisites: An OpenAI API key is a mandatory requirement. Specific hardware or software dependencies beyond this are not detailed in the provided text.
Documentation: An arXiv paper is referenced for further details, though a direct URL is not supplied.

Highlighted Details

The SE-GUI model demonstrates strong performance, achieving 47.2% accuracy with a 7B model and 35.9% with a 3B model, trained on a modest 3k sample dataset.
The project's methodology and results are recognized as a benchmark within several prominent AI projects, including Omniparser v2, Qwen2.5-VL, UI-TARS, UGround, and AGUVIS.
Offers flexibility through ScreenSpot-v2-variants, supporting multiple instruction styles like original, action, target UI description, and negative instructions for varied user interaction needs.

Maintenance & Community

The provided README content does not contain information regarding project maintainers, community channels (like Discord or Slack), roadmaps, or notable contributors.

Licensing & Compatibility

No specific license information is mentioned in the README, which may require further investigation for adoption decisions.

Limitations & Caveats

The README does not explicitly state any limitations or known issues. However, the relatively small training dataset size (3,000 samples) for the SE-GUI model might warrant consideration regarding its generalization capabilities across a wider range of GUIs and scenarios.

ScreenSpot-Pro-GUI-Grounding by likaixin2000

Explore Similar Projects

GUI-G2 by ZJU-REAL

Awesome-GUI-Agents by ZJU-REAL

UGround by OSU-NLP-Group

SeeClick by njucckevin

Aria-UI by AriaUI

GUI-Actor by microsoft

CogAgent by zai-org

MAI-UI by Tongyi-MAI

computer_use_ootb by showlab

Everywhere by DearVa

UI-TARS by bytedance

UI-TARS-desktop by bytedance