Awesome-GUI-Agents  by ZJU-REAL

A curated collection for developing advanced GUI agents

Created 9 months ago
279 stars

Top 93.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary This repository curates resources, tools, and frameworks for developing GUI Agents, targeting AI researchers and developers. It serves as a centralized hub for the latest advancements, papers, datasets, and benchmarks in intelligent agent interaction with graphical user interfaces, aiming to accelerate R&D.

How It Works The project functions as an organized collection, not an executable framework. It categorizes information around the four core GUI Agent modules: perception, exploration, planning, and interaction. Its primary value lies in extensive, chronologically updated lists of research papers, datasets, and benchmarks, covering reinforcement learning, vision-language models, and multimodal agents across desktop, web, and mobile platforms.

Quick Start & Requirements As a curated list, direct installation or execution is not applicable. Users must refer to individual linked papers, GitHub projects, and Hugging Face models for specific implementations, each with its own prerequisites (e.g., Python, libraries, GPU). Relevant links are provided throughout.

Highlighted Details

  • Comprehensive, regularly updated lists of research papers categorized by sub-fields like GUI Grounding, Reinforcement Learning, and Benchmarks.
  • Covers a broad spectrum of GUI agent applications: desktop, web, and mobile environments.
  • Features summaries and analyses of recent AAAI and ICLR accepted papers, highlighting cutting-edge research.
  • Includes links to open-sourced models (e.g., GUI-G2-3B, GUI-G2-7B) and project pages for specific agents.

Maintenance & Community The repository is actively maintained with frequent updates, including new research papers and weekly summaries. Contributions are welcomed via pull requests. No specific community channels or dedicated maintainer information are provided.

Licensing & Compatibility No software license is explicitly stated in the provided text. This absence prevents determining compatibility for commercial use or closed-source linking without further investigation.

Limitations & Caveats This is a curated list, not a runnable framework, requiring users to integrate individual components. The lack of explicit licensing is a significant adoption blocker. The rapid pace of research means information may require verification against primary sources.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
76 stars in the last 30 days

Explore Similar Projects

Starred by Yiran Wu Yiran Wu(Coauthor of AutoGen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

OS-Copilot by OS-Copilot

0.3%
2k
OS agent for automating daily tasks
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
2 more.

Agent-S by simular-ai

1.1%
9k
Agentic framework for autonomous computer interaction
Created 1 year ago
Updated 3 weeks ago
Feedback? Help us improve.