Discover and explore top open-source AI tools and projects—updated daily.
Framework for computer-use agents
Top 64.4% on SourcePulse
OpenCUA provides a comprehensive open-source framework for scaling computer-use agent (CUA) data and foundation models. It addresses the need for robust, generalizable agents capable of performing complex tasks across various applications and operating systems. The framework is targeted at researchers and developers in AI, particularly those working on embodied AI, robotics, and intelligent automation, offering a significant advancement in open-source CUA capabilities.
How It Works
OpenCUA comprises AgentNet, a large-scale dataset of human computer-use demonstrations; AgentNetTool, an annotation infrastructure for capturing these demonstrations; AgentNetBench, an offline evaluator for benchmarking agent actions; and OpenCUA Models, end-to-end foundation models trained on the AgentNet dataset. The core innovation lies in the scale and diversity of the AgentNet dataset, coupled with the framework's ability to process raw demonstrations into concise state-action pairs and synthesize reflective Chain-of-Thought (CoT) reasoning, which enhances model robustness and generalization.
Quick Start & Requirements
pip install -r requirement.txt
within a conda environment (conda create -n opencua python=3.10
).huggingface_hub
to download model weights (e.g., xlangai/OpenCUA-7B
).conda
, huggingface_hub
. Specific model versions may require alignment with Kimi-VL's Tokenizer and ChatTemplate.python huggingface_inference.py
in the ./model/inference/
directory. Run agents in the OSWorld environment using provided commands (e.g., python run_multienv_opencua.py ...
).Highlighted Details
Maintenance & Community
The project acknowledges contributions from various individuals and teams, including Moonshot AI and the Kimi Team, and is built upon DuckTrack and OpenAdapt. Further details on community channels or roadmap are not explicitly provided in the README.
Licensing & Compatibility
The project is intended for research and educational purposes only. Prohibited uses include any activity violating applicable laws or regulations, and illegal, unethical, or harmful activities. The authors disclaim responsibility for any misuse.
Limitations & Caveats
vLLM support is currently in progress, with users advised to use the standard transformers
library. The training code is also under development, with models based on the Kimi Team's infrastructure.
2 weeks ago
Inactive