UI-Venus  by inclusionAI

UI agents for precise GUI interaction

Created 1 month ago
479 stars

Top 63.9% on SourcePulse

GitHubView on GitHub
Project Summary

UI-Venus is an open-source UI agent designed for precise GUI element grounding and navigation across mobile, desktop, and web interfaces. It leverages Reinforcement Fine-Tuning (RFT) and a novel action-level reward design to achieve state-of-the-art performance on various benchmarks, enabling more robust and generalizable autonomous UI interaction.

How It Works

UI-Venus utilizes Reinforcement Fine-Tuning (RFT) with fine-grained, action-wise reward functions for GUI navigation. This approach improves credit assignment in long-horizon tasks and enables end-to-end learnable action prediction. The project also emphasizes data quality, employing a three-stage data refinement pipeline (Prompt Rewrite, Trace Editing, Trace Generation) to enhance training signal fidelity, leading to more robust and generalizable agents.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt
  • Prerequisites: Python, specific dependencies listed in requirements.txt.
  • Configuration: Requires setting paths for screenspot_imgs, screenspot_test, model_name_or_path, log_path (for grounding) or model_path, input_file, output_file (for navigation).
  • Data Format: Example input/output formats for grounding and navigation are provided in the examples/ directory.
  • Links: Hugging Face Model, GitHub Repository.

Highlighted Details

  • Achieves state-of-the-art (SOTA) results on benchmarks including ScreenSpot-Pro, ScreenSpot-v2, OS-World-G, UI-Vision, and Android World.
  • Offers both 7B and 72B model checkpoints.
  • Includes a full evaluation pipeline and inference scripts.
  • Demonstrates superior capability in visual grounding, UI navigation, cross-platform generalization, and complex task reasoning.

Maintenance & Community

  • The project is associated with a technical report available on arXiv (arXiv:2508.10833).

Licensing & Compatibility

  • The repository is open-source, with model checkpoints available on Hugging Face. Specific license details are not explicitly stated in the provided README excerpt, but the open-source nature suggests broad compatibility.

Limitations & Caveats

  • The README mentions that some results for other models are reproduced or are from closed-source models, implying potential differences in evaluation setups or access. Specific hardware requirements (e.g., GPU, VRAM) for running the larger 72B models are not detailed.
Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
15
Star History
352 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research) and Will Brown Will Brown(Research Lead at Prime Intellect).

agent-lightning by microsoft

6.0%
2k
Train any AI agent with rollouts and feedback
Created 3 months ago
Updated 2 days ago
Feedback? Help us improve.