Agent-S  by simular-ai

Agentic framework for autonomous computer interaction

created 9 months ago
5,921 stars

Top 8.9% on sourcepulse

GitHubView on GitHub
Project Summary

Agent S2 is an open-source framework for building autonomous GUI agents that interact with computers like humans. It targets AI researchers and developers interested in advanced automation and agent-based systems, offering state-of-the-art performance on benchmarks like OSWorld and WindowsAgentArena.

How It Works

Agent S2 employs a compositional generalist-specialist architecture. It leverages large language models (LLMs) for general reasoning and a specialized grounding model (like UI-TARS) for precise visual interaction and coordinate prediction on the screen. This dual-model approach allows for robust task execution across diverse graphical interfaces.

Quick Start & Requirements

  • Install via pip: pip install gui-agents
  • Requires API keys for LLM providers (OpenAI, Anthropic, Gemini, Groq, etc.) and potentially Hugging Face.
  • Optional: Docker Desktop for Perplexica (web retrieval).
  • Setup involves configuring API keys and potentially Perplexica.
  • Docs: S2 blog, S2 Paper, S1 Paper

Highlighted Details

  • Achieves SOTA results on OSWorld (34.5% success rate at 50 steps), WindowsAgentArena (29.8%), and AndroidWorld (54.3%).
  • Supports multiple LLM providers and custom endpoints for flexibility.
  • Integrates Perplexica for web-knowledge retrieval to enhance agent capabilities.
  • Offers both CLI and Python SDK for interaction and development.

Maintenance & Community

  • Actively developed with recent updates in March 2025.
  • Papers accepted to ICLR 2025.
  • Links to GitHub releases for knowledge base downloads.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

  • Linux users are warned that conda environments may interfere with pyatspi, suggesting installation without virtual environments.
  • Accurate grounding may depend on correctly setting grounding_model_resize_width for specific resolutions.
  • The agent directly executes Python code, requiring careful usage.
Health Check
Last commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
4
Star History
1,974 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.