OS-Agent-Survey by OS-Agent-Survey

Survey paper on OS Agents using MLLMs for computer, phone, and browser automation

Created 1 year ago

386 stars

Top 74.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

This repository provides a comprehensive survey of OS Agents, which are Large Multimodal Model (LMM)-based agents designed to automate tasks across computers, phones, and browsers by interacting with their interfaces. It serves as a valuable resource for researchers and developers in the rapidly evolving field of AI agents for operating system interaction.

How It Works

The survey categorizes and details existing research in OS Agents, covering foundation models, agent frameworks, evaluation benchmarks, and safety/privacy considerations. It consolidates the state-of-the-art, offering insights into methodologies, challenges, and future directions for building and deploying these agents.

Quick Start & Requirements

This repository is a curated list of papers and resources, not a runnable software agent. No installation or specific requirements are needed to access the information.

Highlighted Details

Extensive tables categorize recent foundation models, agent frameworks, and evaluation benchmarks for OS Agents.
Includes a "Full List" section with chronological updates on new research papers in the field.
Details hiring opportunities with OPPO's Personal AI Team for roles in multimodal LLMs and AI Agents.
Provides links to related GitHub repositories and resources for further community engagement.

Maintenance & Community

The repository is actively updated, with the last update noted as December 13, 2024. Contact information is provided for suggestions and corrections.

Licensing & Compatibility

The repository itself does not specify a license. The content is presented for informational and research purposes.

Limitations & Caveats

The paper associated with this repository was notably rejected by arXiv for not containing "sufficient original or substantive scholarly research," a decision the authors contest. Access to the paper is currently limited to the GitHub repository or OpenReview Archive.

OS-Agent-Survey by OS-Agent-Survey

Explore Similar Projects

awesome-autonomous-gpt by ScarletPan

agency by operand

awesome-computer-use by ranpox

LLM-Agent-Based-Modeling-and-Simulation by tsinghua-fib-lab

Awesome-Agent-Papers by luo-junyu

awesome-multi-agent-papers by kyegomez

Awesome-AI-Agents by Jenqyang

awesome-ai-sdks by e2b-dev

agentops by AgentOps-AI

Agent-Skills-for-Context-Engineering by muratcankoylan

SuperAGI by TransformerOptimus

awesome-ai-agents by e2b-dev