ClawGUI  by ZJU-REAL

Full-stack framework for GUI agents

Created 3 weeks ago

New!

906 stars

Top 39.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

ClawGUI is a full-stack research framework for building, evaluating, and deploying GUI agents. It unifies online reinforcement learning (RL) training, standardized benchmarks, and real-device deployment, streamlining the agent development lifecycle for researchers and developers.

How It Works

The framework comprises three modules: ClawGUI-RL for training, ClawGUI-Eval for evaluation, and ClawGUI-Agent for deployment. ClawGUI-RL uses scalable online RL with parallel Dockerized Android emulators or physical devices, employing GiGPO+PRM for enhanced policy optimization via fine-grained rewards. ClawGUI-Eval offers a standardized measurement baseline using a three-stage Infer-Judge-Metric pipeline across 6 benchmarks and 11+ models, achieving over 95% result reproduction fidelity. ClawGUI-Agent enables real-world deployment through natural language control of mobile devices (Android, HarmonyOS, iOS) via 12+ chat platforms, integrating one-command evaluation.

Quick Start & Requirements

Begin by cloning the repository (git clone https://github.com/ZJU-REAL/ClawGUI.git) and navigating into the directory (cd ClawGUI). Each module has independent setup instructions. Python 3.12 is required, and Docker is necessary for parallel environment training. More details are on the Project Page.

Highlighted Details

  • The end-to-end trained ClawGUI-2B agent achieved 17.1 MobileWorld SR, significantly outperforming the 11.1 baseline.
  • ClawGUI-Eval ensures research comparability with 95.8% fidelity reproduction across 6 benchmarks and 11+ models.
  • ClawGUI-RL supports training on dozens of parallel Dockerized Android emulators or physical devices.
  • ClawGUI-Agent enables cross-platform device control (Android, HarmonyOS, iOS) via 12+ chat platforms.

Maintenance & Community

Contribution guidelines are available in CONTRIBUTING.md. The project acknowledges dependencies on other open-source efforts. No specific community channels or maintainer details are provided.

Licensing & Compatibility

Licensed under the Apache License 2.0, this project permits commercial use and integration into closed-source applications.

Limitations & Caveats

The framework currently focuses on mobile GUI agents. Future work includes expanding to desktop/web environments for online RL, enabling on-device ClawGUI-Agent deployment for privacy, and integrating real-time RL. As a research framework, ongoing development and refinement are expected.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
10
Star History
909 stars in the last 21 days

Explore Similar Projects

Feedback? Help us improve.