Agent-S  by simular-ai

Agentic framework for autonomous computer interaction

Created 11 months ago
6,267 stars

Top 8.3% on SourcePulse

GitHubView on GitHub
Project Summary

Agent S2 is an open-source framework for building autonomous GUI agents that interact with computers like humans. It targets AI researchers and developers interested in advanced automation and agent-based systems, offering state-of-the-art performance on benchmarks like OSWorld and WindowsAgentArena.

How It Works

Agent S2 employs a compositional generalist-specialist architecture. It leverages large language models (LLMs) for general reasoning and a specialized grounding model (like UI-TARS) for precise visual interaction and coordinate prediction on the screen. This dual-model approach allows for robust task execution across diverse graphical interfaces.

Quick Start & Requirements

  • Install via pip: pip install gui-agents
  • Requires API keys for LLM providers (OpenAI, Anthropic, Gemini, Groq, etc.) and potentially Hugging Face.
  • Optional: Docker Desktop for Perplexica (web retrieval).
  • Setup involves configuring API keys and potentially Perplexica.
  • Docs: S2 blog, S2 Paper, S1 Paper

Highlighted Details

  • Achieves SOTA results on OSWorld (34.5% success rate at 50 steps), WindowsAgentArena (29.8%), and AndroidWorld (54.3%).
  • Supports multiple LLM providers and custom endpoints for flexibility.
  • Integrates Perplexica for web-knowledge retrieval to enhance agent capabilities.
  • Offers both CLI and Python SDK for interaction and development.

Maintenance & Community

  • Actively developed with recent updates in March 2025.
  • Papers accepted to ICLR 2025.
  • Links to GitHub releases for knowledge base downloads.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

  • Linux users are warned that conda environments may interfere with pyatspi, suggesting installation without virtual environments.
  • Accurate grounding may depend on correctly setting grounding_model_resize_width for specific resolutions.
  • The agent directly executes Python code, requiring careful usage.
Health Check
Last Commit

4 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
7
Star History
142 stars in the last 30 days

Explore Similar Projects

Starred by Yiran Wu Yiran Wu(Coauthor of AutoGen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

OS-Copilot by OS-Copilot

0.1%
2k
OS agent for automating daily tasks
Created 1 year ago
Updated 1 year ago
Starred by Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
9 more.

AgentGPT by reworkd

0.1%
35k
Autonomous AI agent platform in your browser
Created 2 years ago
Updated 4 months ago
Feedback? Help us improve.