Survey paper on OS Agents using MLLMs for computer, phone, and browser automation
Top 87.4% on sourcepulse
This repository provides a comprehensive survey of OS Agents, which are Large Multimodal Model (LMM)-based agents designed to automate tasks across computers, phones, and browsers by interacting with their interfaces. It serves as a valuable resource for researchers and developers in the rapidly evolving field of AI agents for operating system interaction.
How It Works
The survey categorizes and details existing research in OS Agents, covering foundation models, agent frameworks, evaluation benchmarks, and safety/privacy considerations. It consolidates the state-of-the-art, offering insights into methodologies, challenges, and future directions for building and deploying these agents.
Quick Start & Requirements
This repository is a curated list of papers and resources, not a runnable software agent. No installation or specific requirements are needed to access the information.
Highlighted Details
Maintenance & Community
The repository is actively updated, with the last update noted as December 13, 2024. Contact information is provided for suggestions and corrections.
Licensing & Compatibility
The repository itself does not specify a license. The content is presented for informational and research purposes.
Limitations & Caveats
The paper associated with this repository was notably rejected by arXiv for not containing "sufficient original or substantive scholarly research," a decision the authors contest. Access to the paper is currently limited to the GitHub repository or OpenReview Archive.
1 month ago
1 day