Discover and explore top open-source AI tools and projects—updated daily.
Controllable world for benchmarking interactive coding agents
Top 95.2% on SourcePulse
AppWorld provides a high-fidelity, controllable simulated environment for benchmarking interactive coding agents. It features 9 day-to-day applications with over 450 APIs, simulating ~100 users, enabling agents to perform complex, interactive coding tasks. This platform offers a standardized benchmark for evaluating agent capabilities in realistic scenarios.
How It Works
The system simulates a world of apps and people, allowing agents to interact via Python code making API calls. It supports stateful execution, maintaining context across interactions, and provides a comprehensive set of 457 APIs across 9 applications. This design facilitates the development and rigorous evaluation of agents for complex, multi-step tasks.
Quick Start & Requirements
Installation requires pip install appworld
, followed by appworld install
to unpack encrypted code and appworld download data
to fetch benchmark datasets. Python 3.11+ is a prerequisite. Key resources include a website, task/API explorers, a leaderboard, and extensive documentation.
Highlighted Details
world.execute()
with state persistence.Maintenance & Community
Hosted on GitHub, the project appears actively maintained with clear channels for feedback and contributions via issues. Specific community links (e.g., Discord, Slack) are not explicitly mentioned.
Licensing & Compatibility
Public components are Apache 2.0 licensed. Protected portions (app/task specifics) are also Apache 2.0 but require public redistribution of derivatives to remain encrypted. LLM training is permitted.
Limitations & Caveats
Key app/task data is in encrypted .bundle
files, limiting direct GitHub inspection. Test sets provide only evaluation programs to prevent leakage. The README cautions against posting extracted .bundle
content online. While safety features are robust, Docker is recommended for maximum isolation. Realistic state reversion is not supported.
2 months ago
1 day