GUI-G2  by ZJU-REAL

Gaussian reward modeling for precise GUI grounding

Created 5 months ago
286 stars

Top 91.7% on SourcePulse

GitHubView on GitHub
Project Summary

GUI-G² introduces a novel Gaussian reward modeling framework for training models to perform GUI grounding tasks. It addresses the limitations of traditional reinforcement learning rewards by mimicking human interaction patterns, specifically the Gaussian-like spatial distributions of clicks around targets. This approach offers a more precise and robust method for training models to accurately identify and interact with GUI elements, benefiting researchers and developers working on human-computer interaction, visual language models, and automated UI agents.

How It Works

GUI-G² employs a Gaussian reward framework inspired by human click behavior observed in datasets like AITW. The core innovation lies in its reward functions: Gaussian Point Reward, which rewards proximity to target centers, and Gaussian Coverage Reward, which encourages spatial alignment with the target area. An Adaptive Variance Mechanism dynamically adjusts the reward granularity based on the GUI element's scale. This dense reward signal provides smoother gradients compared to sparse, binary RL rewards, leading to more efficient and effective early-stage learning.

Quick Start & Requirements

  • Installation: Requires Python 3.10. Installation involves creating a conda environment (conda create -n gui-g2 python=3.10), activating it (conda activate gui-g2), and running bash setup.sh. Manual dependency installation includes transformers==4.49.0 and deepspeed==0.15.4.
  • Prerequisites: Python 3.10, transformers, deepspeed, and potentially CUDA-enabled hardware for efficient inference/training (as indicated by device_map="cuda").
  • Models: Pre-trained models GUI-G2-3B and GUI-G2-7B are available on Huggingface. Download commands are provided.
  • Links: Project Page: https://zju-real.github.io/GUI-G2, Code: https://github.com/zju-real/GUI-G2, Paper: https://arxiv.org/abs/2507.15846.

Highlighted Details

  • Achieves state-of-the-art performance on the ScreenSpot, ScreenSpot-v2, and ScreenSpot-Pro datasets.
  • Offers pre-trained models in 3B and 7B parameter sizes.
  • The Gaussian reward mechanism provides dense learning signals, improving gradient smoothness over binary RL rewards.

Maintenance & Community

The project announced its paper acceptance to AAAI 2026 in November 2025 and open-sourced its 3B and 7B models in August 2025, following the paper release in July 2025. The primary community and code repository is hosted on GitHub.

Licensing & Compatibility

The provided README does not specify a software license. This lack of explicit licensing information presents a significant blocker for evaluating commercial use or closed-source integration compatibility.

Limitations & Caveats

Evaluation checkpoints are noted as "will be released soon," indicating that the evaluation setup might still be under active development or not fully finalized. The project's association with AAAI 2026 suggests it is a recent research contribution and may still be evolving.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
44 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.