UI-Venus by inclusionAI

UI agents for precise GUI interaction

Created 6 months ago

1,136 stars

Top 33.7% on SourcePulse

Project Summary

UI-Venus is an open-source UI agent designed for precise GUI element grounding and navigation across mobile, desktop, and web interfaces. It leverages Reinforcement Fine-Tuning (RFT) and a novel action-level reward design to achieve state-of-the-art performance on various benchmarks, enabling more robust and generalizable autonomous UI interaction.

How It Works

UI-Venus utilizes Reinforcement Fine-Tuning (RFT) with fine-grained, action-wise reward functions for GUI navigation. This approach improves credit assignment in long-horizon tasks and enables end-to-end learnable action prediction. The project also emphasizes data quality, employing a three-stage data refinement pipeline (Prompt Rewrite, Trace Editing, Trace Generation) to enhance training signal fidelity, leading to more robust and generalizable agents.

Quick Start & Requirements

Installation: pip install -r requirements.txt
Prerequisites: Python, specific dependencies listed in requirements.txt.
Configuration: Requires setting paths for screenspot_imgs, screenspot_test, model_name_or_path, log_path (for grounding) or model_path, input_file, output_file (for navigation).
Data Format: Example input/output formats for grounding and navigation are provided in the examples/ directory.
Links: Hugging Face Model, GitHub Repository.

Highlighted Details

Achieves state-of-the-art (SOTA) results on benchmarks including ScreenSpot-Pro, ScreenSpot-v2, OS-World-G, UI-Vision, and Android World.
Offers both 7B and 72B model checkpoints.
Includes a full evaluation pipeline and inference scripts.
Demonstrates superior capability in visual grounding, UI navigation, cross-platform generalization, and complex task reasoning.

Maintenance & Community

The project is associated with a technical report available on arXiv (arXiv:2508.10833).

Licensing & Compatibility

The repository is open-source, with model checkpoints available on Hugging Face. Specific license details are not explicitly stated in the provided README excerpt, but the open-source nature suggests broad compatibility.

Limitations & Caveats

The README mentions that some results for other models are reproduced or are from closed-source models, implying potential differences in evaluation setups or access. Specific hardware requirements (e.g., GPU, VRAM) for running the larger 72B models are not detailed.

Health Check

Last Commit

21 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

539 stars in the last 30 days