GUI agent resource list
Top 44.7% on sourcepulse
This repository is a curated list of papers, projects, and resources focused on multi-modal Graphical User Interface (GUI) agents. It serves as a comprehensive knowledge base for researchers and developers aiming to build sophisticated digital assistants capable of interacting with graphical interfaces across various platforms like desktops and mobile devices.
How It Works
The project acts as a central hub, aggregating and categorizing academic papers, open-source projects, and datasets relevant to GUI agents. It covers key areas such as datasets and benchmarks for evaluating agent performance, specific models and agent architectures, and survey papers that provide broader overviews of the field. The goal is to facilitate the development of more capable and generalist AI agents that can understand and manipulate GUIs.
Quick Start & Requirements
This repository is a curated list and does not have a direct installation or execution command. It serves as a reference guide.
Highlighted Details
Maintenance & Community
The project is actively maintained and welcomes contributions via issues and pull requests. It references templates from "Awesome-Video-Diffusion" and "Awesome-MLLM-Hallucination."
Licensing & Compatibility
The repository itself does not specify a license, but the listed papers and projects will have their own respective licenses. Compatibility for commercial use would depend on the licenses of the individual resources cited.
Limitations & Caveats
As a curated list, the repository does not provide any implementation or code for GUI agents itself. The utility is purely informational, requiring users to explore and integrate the cited resources independently.
2 months ago
Inactive