ClawGUI by ZJU-REAL

Full-stack framework for GUI agents

Created 3 months ago

1,316 stars

Top 29.7% on SourcePulse

Project Summary

Summary

ClawGUI is a full-stack research framework for building, evaluating, and deploying GUI agents. It unifies online reinforcement learning (RL) training, standardized benchmarks, and real-device deployment, streamlining the agent development lifecycle for researchers and developers.

How It Works

The framework comprises three modules: ClawGUI-RL for training, ClawGUI-Eval for evaluation, and ClawGUI-Agent for deployment. ClawGUI-RL uses scalable online RL with parallel Dockerized Android emulators or physical devices, employing GiGPO+PRM for enhanced policy optimization via fine-grained rewards. ClawGUI-Eval offers a standardized measurement baseline using a three-stage Infer-Judge-Metric pipeline across 6 benchmarks and 11+ models, achieving over 95% result reproduction fidelity. ClawGUI-Agent enables real-world deployment through natural language control of mobile devices (Android, HarmonyOS, iOS) via 12+ chat platforms, integrating one-command evaluation.

Quick Start & Requirements

Begin by cloning the repository (git clone https://github.com/ZJU-REAL/ClawGUI.git) and navigating into the directory (cd ClawGUI). Each module has independent setup instructions. Python 3.12 is required, and Docker is necessary for parallel environment training. More details are on the Project Page.

Highlighted Details

The end-to-end trained ClawGUI-2B agent achieved 17.1 MobileWorld SR, significantly outperforming the 11.1 baseline.
ClawGUI-Eval ensures research comparability with 95.8% fidelity reproduction across 6 benchmarks and 11+ models.
ClawGUI-RL supports training on dozens of parallel Dockerized Android emulators or physical devices.
ClawGUI-Agent enables cross-platform device control (Android, HarmonyOS, iOS) via 12+ chat platforms.

Maintenance & Community

Contribution guidelines are available in CONTRIBUTING.md. The project acknowledges dependencies on other open-source efforts. No specific community channels or maintainer details are provided.

Licensing & Compatibility

Licensed under the Apache License 2.0, this project permits commercial use and integration into closed-source applications.

Limitations & Caveats

The framework currently focuses on mobile GUI agents. Future work includes expanding to desktop/web environments for online RL, enabling on-device ClawGUI-Agent deployment for privacy, and integrating real-time RL. As a research framework, ongoing development and refinement are expected.

ClawGUI by ZJU-REAL

Explore Similar Projects

alphora by opencmit

gptel-agent by karthink

agents-last-exam by rdi-berkeley

Toolathlon by hkust-nlp

AgentCPM by OpenBMB

MiniMax-M2.5 by MiniMax-AI

WindowsAgentArena by microsoft

terminal-bench-2 by harbor-framework

AgentGym by WooooDyy

acu by trycua

flue by withastro

antigravity-sdk-python by google-antigravity