Agent-S by simular-ai

Agentic framework for autonomous computer interaction

Created 1 year ago

9,875 stars

Top 5.1% on SourcePulse

View on GitHub

4 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Elvis Saravia

Founder of DAIR.AI

Robert Stojnic

Cocreator of Papers with Code

Alex Chen

Cofounder of Nexa AI

Project Summary

Agent S2 is an open-source framework for building autonomous GUI agents that interact with computers like humans. It targets AI researchers and developers interested in advanced automation and agent-based systems, offering state-of-the-art performance on benchmarks like OSWorld and WindowsAgentArena.

How It Works

Agent S2 employs a compositional generalist-specialist architecture. It leverages large language models (LLMs) for general reasoning and a specialized grounding model (like UI-TARS) for precise visual interaction and coordinate prediction on the screen. This dual-model approach allows for robust task execution across diverse graphical interfaces.

Quick Start & Requirements

Install via pip: pip install gui-agents
Requires API keys for LLM providers (OpenAI, Anthropic, Gemini, Groq, etc.) and potentially Hugging Face.
Optional: Docker Desktop for Perplexica (web retrieval).
Setup involves configuring API keys and potentially Perplexica.
Docs: S2 blog, S2 Paper, S1 Paper

Highlighted Details

Achieves SOTA results on OSWorld (34.5% success rate at 50 steps), WindowsAgentArena (29.8%), and AndroidWorld (54.3%).
Supports multiple LLM providers and custom endpoints for flexibility.
Integrates Perplexica for web-knowledge retrieval to enhance agent capabilities.
Offers both CLI and Python SDK for interaction and development.

Maintenance & Community

Actively developed with recent updates in March 2025.
Papers accepted to ICLR 2025.
Links to GitHub releases for knowledge base downloads.

Licensing & Compatibility

The specific license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

Linux users are warned that conda environments may interfere with pyatspi, suggesting installation without virtual environments.
Accurate grounding may depend on correctly setting grounding_model_resize_width for specific resolutions.
The agent directly executes Python code, requiring careful usage.

Health Check

Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

292 stars in the last 30 days