Discover and explore top open-source AI tools and projects—updated daily.
Cross-platform computer use agents for GUI automation
Top 51.7% on SourcePulse
OpenGVLab/ScaleCUA provides a solution for developing robust, general-purpose Computer Use Agents (CUAs) for GUI automation, addressing the critical challenge of data scarcity for cross-platform operations. It offers a large-scale, cross-platform dataset and trained models designed to enable CUAs that operate seamlessly across diverse environments, delivering significant performance improvements. This project is targeted at researchers and engineers aiming to advance autonomous GUI interaction capabilities.
How It Works
The project's core innovation is its data-driven scaling approach. ScaleCUA introduces a large-scale, cross-platform dataset spanning six operating systems and three GUI-centric task domains, collected via a closed-loop pipeline integrating automated agents and human expertise. By training on this extensive dataset, ScaleCUA models achieve enhanced transferability and robust performance across various platforms, overcoming limitations of previous CUA development constrained by data scale and domain specificity.
Quick Start & Requirements
To set up, clone the repository (git clone https://github.com/OpenGVLab/ScaleCUA.git
), navigate into the directory, and install dependencies (pip install -r requirements.txt
). Key requirements include vLLM
for model deployment and evaluation. The project offers pre-configured virtual machines for Ubuntu, Android, and Web environments to facilitate interactive playground use. Further details on model deployment, playground setup, and evaluation can be found in the respective README files within the repository.
Highlighted Details
Maintenance & Community
The project released its models and code on September 19, 2025, with the ScaleCUA-Data dataset pending upload to HuggingFace. The README acknowledges contributions from numerous open-source projects. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the text.
Licensing & Compatibility
The project is licensed under the Apache 2.0 License. This permissive license generally allows for commercial use and integration into closed-source projects without significant restrictions.
Limitations & Caveats
The associated paper is listed as a preprint, indicating it may not have undergone formal peer review. The ScaleCUA-Data dataset is still being uploaded to HuggingFace, meaning full data access may be delayed. The project's functionality is tied to specific vision-language models (Qwen2.5-VL, InternVL) and deployment frameworks (vLLM).
1 week ago
Inactive