ScaleCUA by OpenGVLab

Cross-platform computer use agents for GUI automation

Created 6 months ago

1,092 stars

Top 34.8% on SourcePulse

Project Summary

OpenGVLab/ScaleCUA provides a solution for developing robust, general-purpose Computer Use Agents (CUAs) for GUI automation, addressing the critical challenge of data scarcity for cross-platform operations. It offers a large-scale, cross-platform dataset and trained models designed to enable CUAs that operate seamlessly across diverse environments, delivering significant performance improvements. This project is targeted at researchers and engineers aiming to advance autonomous GUI interaction capabilities.

How It Works

The project's core innovation is its data-driven scaling approach. ScaleCUA introduces a large-scale, cross-platform dataset spanning six operating systems and three GUI-centric task domains, collected via a closed-loop pipeline integrating automated agents and human expertise. By training on this extensive dataset, ScaleCUA models achieve enhanced transferability and robust performance across various platforms, overcoming limitations of previous CUA development constrained by data scale and domain specificity.

Quick Start & Requirements

To set up, clone the repository (git clone https://github.com/OpenGVLab/ScaleCUA.git), navigate into the directory, and install dependencies (pip install -r requirements.txt). Key requirements include vLLM for model deployment and evaluation. The project offers pre-configured virtual machines for Ubuntu, Android, and Web environments to facilitate interactive playground use. Further details on model deployment, playground setup, and evaluation can be found in the respective README files within the repository.

Highlighted Details

ScaleCUA-Data: A large-scale, cross-platform dataset covering 6 operating systems and 3 GUI task domains.
ScaleCUA-Models: A general-purpose agent capable of cross-platform GUI task completion.
Performance: Achieves state-of-the-art results, including 94.4% on MMBench-GUI L1-Hard and 47.4% on WebArena-Lite-v2, with significant gains over baselines (+26.6 on WebArena-Lite-v2).
Platform Support: Operates across Ubuntu, Android, macOS, Web, and Windows environments.
Training Framework: Supports training agents using Qwen2.5-VL and InternVL models.

Maintenance & Community

The project released its models and code on September 19, 2025, with the ScaleCUA-Data dataset pending upload to HuggingFace. The README acknowledges contributions from numerous open-source projects. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the text.

Licensing & Compatibility

The project is licensed under the Apache 2.0 License. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The associated paper is listed as a preprint, indicating it may not have undergone formal peer review. The ScaleCUA-Data dataset is still being uploaded to HuggingFace, meaning full data access may be delayed. The project's functionality is tied to specific vision-language models (Qwen2.5-VL, InternVL) and deployment frameworks (vLLM).

ScaleCUA by OpenGVLab

Explore Similar Projects

Awesome-GUI-Agents by ZJU-REAL

EvoCUA by meituan

SeeClick by njucckevin

ShowUI-Aloha by showlab

CogAgent by zai-org

acu by trycua

ShowUI by showlab

MAI-UI by Tongyi-MAI

terminator by mediar-ai

computer_use_ootb by showlab

fara by microsoft

UI-TARS-desktop by bytedance